Microsoft Bing made waves in recent days by touting its new integration with the ChatGPT AI text generation tool. What does this mean in practice? Bing has a few billion web pages in its search index. Microsoft can feed these pages into ChatGPT, which will then process the information. The idea is that when people search, their questions can purportedly be answered right on Microsoft’s Bing website by ChatGPT, which will use information skimmed from websites written by human beings. The author of the WorldOfMatthew blog described big tech’s ideal state of affairs aptly with respect to Google’s Bard, which promises to operate under the same principles as Bing’s ChatGPT integration:

The problem is that if Google answers the question asked[,] the average user will not click the “Learn more” links. That means far-less traffic for websites that rely on the Google monopoly for traffic…

He continues…

Basically, it used to be a webmaster would allow Google to crawl their website for search traffic. Now Google will crawl websites and effectively steal that content for themselves.

That is, both Microsoft and Google seek to use AI to generate their own content from content written by humans in their search indexes, but for the express purpose of keeping searchers within the confines of Microsoft in Google instead of visiting websites beyond the direct control of Microsoft and Google.

According to Mr. Avram Plitch at Tom’s Hardware, there are early indications that Google’s Bard is not showing the sources of its AI-generated material at all. Bing is apparently slightly better under the principle that “all is relative”:

Bing’s new chatbot implementation is slightly better than Google’s in that it actually shows sources, but it buries them in tiny footnotes, some of which aren’t even visible unless you click a button to expand the answer.

(I hope you note that unlike the AI tools we describe above, I always make a point of clearly citing to my sources.)

Microsoft’s move, along with related efforts by Google to integrate the Bard AI text generation tool into Google Search, have inspired much discussion, some of it positive. I will say despite my citing to critiques of both Microsoft and Google, critiques with which I largely agree, I do not discount the possibility that similar AI tools could be used in ethical ways, perhaps with proper, clear, and transparent citations and link-backs and webmaster opt-in, to help users who may have particular use cases for them. However, the purpose of this article is not to dive into the full implications of early-stage AI search integrations that have barely rolled out, much less been tested. My purpose is here is different — I pose a question to Microsoft from the webmaster of a site that as of February 12, 2023, and since January 14, 2023, has been blacklisted from appearing in Bing’s search results without any explanation, communication, or resource. Pertinent to my question is the fact that The New Leaf Journal is still in Bing’s web search index – but is being explicitly blocked from appearing to searchers.

Results for "site:thenewleafjournal.com" search in Bing as of February 13, 2023, showing that all results from The New Leaf Journal domain are indexed by Bing, but blocked from appearing.

Let us imagine that there is a site that Microsoft has in Bing’s web search index but explicitly refuses to show to web searchers for one unknown reason or another. The author at WorldOfMatthew noted with respect to Google that “[t]here is zero sign that Google will ask for permission to use your content for their AI instant answers or compensate creators for their work that Google is now using.” The same is true of Microsoft Bing. Now I know that Microsoft will not answer my question; it has not answered my question about why our site is being blacklisted from Bing. (Microsoft is very responsive to Chinese Communist Party officials, but far less responsive to citizens and nationals of the United States who have questions about its indexing decisions.) But let us play make-believe just like the people who pretend that Chat GPT produces original material. Moreover, let us ask questions one by one since asking multi-part questions tends to yield no answers.

  1. If a site is in Microsoft Bing’s index but is being blocked from appearing in Microsoft Bing’s web search results, can Chat GPT use information from that site to generate answers to search queries.
  2. If Chat GPT uses information from a site that is in Bing’s index but being blacklisted from appearing in search results to inform its answer to a visitor question, will Bing credit the site with a clear link to the source in Chat GPT’s purportedly original artificial intelligence answer?

These are simple questions. Regardless of the merits of Microsoft’s plans for using Chat GPT, it seems obvious that it should at least limit the tool to borrowing content from sites that it allows to appear in Bing’s search results. If Microsoft is blocking a domain from appearing in its search results, regardless of whether the reason is legitimate, illegitimate, or otherwise wholly un-explained, it should ensure that Chat GPT is not using the site to inform its answers.

Of course, Microsoft, like other unaccountable big tech conglomerates, is far more likely than not to have its cake and eat every crumb too. Now in our case, we are a small website – and it is highly unlikely that our material will be substantially informing Bing’s Chat GPT implementation or Google’s Bard, although we do have some of the most in-depth online articles on specific subjects such as English-language discourse on the origin of tsuki ga kirei, English-language ONScripter visual novels and related issues, and the poetry of Charlotte Becker. Nevertheless, there are principles at stake in what the big tech search duopoly is doing – with principles being a foreign concept to both Google and Microsoft.

Now readers may note I focused almost entirely on Bing at the expense of Google, notwithstanding some indications that Google’s Bard may be more predatory than Microsoft’s Chat GPT. The reason for this is that I have far more confidence that Google will not do things such as randomly blacklisting websites with no explanation than I do that Bing would not do so. So long as that is the case, I can at least be assured that web searchers who are not content to take Google’s AI word can navigate to my site from Google. This is clearly not the case with Bing. Moreover, Google’s webmaster tools are somewhat helpful while Bing’s are genuinely inferior to Yandex’s, much less Google’s. Finally, despite the fact that Google receives the brunt of attention because it is the dominant search engine everywhere outside of Russia and China, Bing is of special importance because most of the so-called alternatives rely on its search index, and it may to some extent or another affect others that are not wholly reliant.