Photo: Carl Bergstrom

LESSON 9

Blue Links Matter

Developed by Carl T. Bergstrom and Jevin D. West

Photo: Carl Bergstrom
Photo: Carl Bergstrom

Large language models are not designed to retrieve information. They are designed to generate plausible text. Yet they are quickly replacing traditional search algorithms as the way we find answers to our questions, if only because search engines including Google are imposing them on us.

Let's consider four issues that arise in using LLMs in search: accuracy, sourcing, consistency, and fragility.

Accuracy. For over two decades, you could type anything you wanted to know about into Google, and immediately get a list of relevant websites. For example, suppose we wanted to know whether any US presidents attended the University of Washington, where we both teach.

If you typed "which US presidents attended the University of Washington" into the search bar, you might have gotten a list of websites such as these. You could follow the links, and see that there are no UW graduates among the ranks of United States presidents.

Illustration of a google search query that returns four websites: C: CollegeXpress ;)( https://www.collegexpress.com > lists > list> colleges-a... : Colleges and Universities Attended by US Presidents Everyone starts somewhere, even our nation's leaders! If you have your sights set on the White House, consider these colleges attended by US presidents. ~ Statista ~ https://www.statista.com > Society > Education & Science : U.S. Presidents: universities attended 1751-1991 Jul 4, 2024 - 37 of the United States' 45 presidents (officially 46 as Grover Cleveland is counted as both the 22nd and 24th president) attended a university, college or ... C -. College Confidential u https://www.collegeconfidential.com >lists> colleges-at... : Colleges Attended by Presidents The three presidents who attended William & Mary are Thomas Jefferson, James Monroe and John Tyler. Jefferson and Tyler both graduated from the university while ... ~ Pioneer Academics • https://pioneeracademics.com > news > infographic-pres... : US Presidents and Higher Education: Charting 45 Unique ... Feb 18, 2022 - Four received degrees from this college, including, as mentioned, Washington - alongside Thomas Jefferson, James Monroe, and John Tyler. Thomas ...

You still can ask Google such questions— but these familiar blue links no longer appear at the top of your search query. To reach them you have to scroll past an “AI Overview” written by Google’s Gemini LLM.

Worse still, like all LLMs, Gemini often makes up utter nonsense about the most basic matters of fact. Take our query about US presidents. Google incorrectly reports that four—Washignton, Jefferson, Monroe, and Tyler—graduated from the University of Washington. Three of the four died before the UW was even founded.

This is a big problem, because people tend to trust Google and if you're not an expert, you are unlikely to question what it tells you. Or perhaps you are an expert, but don't realize how carefully you have to double-check everything Google tells you these days.

Sourcing. When LLMs pull information directly from their training data, their responses don't give you any way to check sources. They don't know their sources — and in fact they don't have sources in the way we typically think about them. They just have a vast soup of training data they use to predict what word should come next.

More recent models can also pull information from web searches and use these search queries to provide sources. At present, however, this process suffers from the same fabrications and confident false assertions that characterize LLMs more generally. One study found that commercial LLMs cite incorrect or non-existent sources from 37% (Perplexity.ai) to 94% (Grok) of the time.

This is a big problem. A key theme in our book Calling Bullshit is the need to investigate claims by tracking them back to their sources. But the whole point of replacing search engines with large language models is to eliminate sourcing. The CEO of the Perplexity AI search platform argues that AI has made blue links obsolete.

Google Query: how many us presidents graduated from the university of washington? AI Overview: Four US presidents graduated from the University of Washington: George Washington, Thomas Jefferson, James Monroe, and John Tyler.

Google query conducted 12/27/2024.

Google query conducted 12/27/2024.

But without accurate sources, you are driving blind.

Consistency. Information retrieval systems need to be consistent. LLM's are anything but.

Imagine an enchanted encyclopedia that told you something different each time you looked up a topic. It might be fun to play with, but it wouldn't be a reliable information source.

When you query ChatGPT for facts about the world, this is essentially what you are getting. Ask the same question three times and you may get three different answers.

Recall that LLMs work by predicting likely next words in a string of text. It turns out that if you let them always pick the single most likely next response, their prose becomes stilted, they get stuck in loops, and they don't do a good job of creating human-like responses. The work-around is to add some randomness to the process. Sometimes they pick the most likely next word, a little less often they pick second-most likely etc., and so on.

This may be desirable when using a language model for a conversational partner, but it's not what you want an information retrieval system to do.

Some ChatGPT 4.o queries yield different answers each time you run them.

ChatGPT session. Who is the author of "a national prayer for peace"? The author of "A National Prayer for Peace" is often attributed to William Inge, an Anglican cleric and writer who was also the Dean of St. Paul's Cathedral in London during the early 20th century. His works often reflected his spiritual and intellectual engagement with contemporary social issues, and he was sometimes referred to as "The Gloomy Dean" for his pessimistic tone. Let me know if this aligns with the context you're referring to, or if you'd like me to explore a different attribution! Who is the author of "a national prayer for peace"? The author of "A National Prayer for Peace" is Dr. Maya Angelou, the renowned poet, author, and civil rights activist. She wrote this poem for the United States' National Day of Prayer in 1995, calling for unity, healing, and peace among people of all backgrounds. The prayer reflects her characteristic themes of hope, resilience, and the shared humanity that transcends divisions. < 2/2 > Who is the author of "a national prayer for peace"? The author of "A National Prayer for Peace" is Jonathan Cahn, a Messianic Jewish pastor and author. This prayer is often associated with his broader works on spirituality, revival, and calls for national repentance. < 3/3 >

Consecutive ChatGPT 4.o queries 1/4/2025

Consecutive ChatGPT 4.o queries 1/4/2025

Fragility. LLMs can be extraordinarily fragile with respect to the precise inputs they are given. Here I ask Google for help fact-checking a claim.

Google's AI overview erroneously assures me that Ed Sheeran has a song called "The Crow".

Google query: does ed sheeran have a song called "the crow" AI Overview Yes, Ed Sheeran has a song called "The Crow". Information. Song: "The Crow" Album: Part of Sheeran's new album, No.

Google query 1/3/2025

Google query 1/3/2025

Asking the same question but removing the quotation marks around the song title, I get the opposite answer.

In this case, the difference is replicable. We asked repeatedly, and each time Google's Gemini said yes if we put "the crow" in quotes and no otherwise.

Some people will argue that LLMs are great for information retrieval, and if you aren't getting good answers it is because you don't know how to ask the right questions. This strikes us a powerful counterexample. How would anyone know, in advance, that you have to omit quotation marks around a title?

Google query: does ed sheeran have a song called the crow AI Overview No, Ed Sheeran does not have a song called "the crow", but he has written and performed songs for other artists, including: "Little Things" for One Direction, "Lay It All on Me" for Rudimental, "When Christmas Comes Around" for Matt Terry

Google query 1/3/2025

Google query 1/3/2025

So why does this happen? As we discussed in Lesson 5, it's difficult or impossible to reverse engineer LLMs to get precise answers.

But we can understand the general problem. For us, the statements "Ed Sheeran has a song called The Crow" and "Ed Sheeran doesn't have a song called The Crow" feel like polar opposites.

Diagram of a 3D space. Discs represent claims. Discs labeled "Ed Sheeran doesn't have a song called The Crow" and "Ed Sheeran has a song called The Crow" are at polar oppose ends of the horizontal axis. Three other discs are scattered elsewhere at a distance: "The mitochondrion is the powerhouse of the cell", "Time flies", and "The Lions will win the Superbowl".

How a human sees the world: "X is true" and "X is false" are polar opposites.

How a human sees the world: "X is true" and "X is false" are polar opposites.

But recall that LLMs encode strings of words in high-dimensional spaces. This is a bit of a simplification, but for the LLM, these statements may be very close to one another in that high dimensional space. Both involve Ed Sheeran. Both involve a song called The Crow. What fraction of English utterances involve either, let alone both? To the LLM, whether you throw in the word "doesn't" almost is trivial.

Diagram of a 3D space. Discs represent claims. Discs labeled "Ed Sheeran doesn't have a song called The Crow" and "Ed Sheeran has a song called The Crow" are very close together. Three other discs are scattered elsewhere at a distance: "The mitochondrion is the powerhouse of the cell", "Time flies", and "The Lions will win the Superbowl".

How an LLM sees the world: "X is true" and "X is false" are both about X, so they are very close together.

How an LLM sees the world: "X is true" and "X is false" are both about X, so they are very close together.

Predictive text machines like LLMs measure the distance between statements very differently from the way people do. Claims that seem diametrically opposed to us may seem almost identical to them. As a result, they can be wildly inconsistent in their responses.

And then there are the broader consequences for the web.

Blue links direct traffic to content producers: journalists, artists, authors, bloggers, professors. LLM summaries hijack that traffic.

We anticipate powerful, damaging spillover effects. If no one follows web links, there will be few incentives for people to create high-quality content in the first place.

Meanwhile large language models make it easy to produce low-quality content at the touch of a button. Quality human-created information may be swamped by AI-generated bullshit.

Search engines that ingest and regurgitate content instead of linking to external pages — they are killing the golden goose that put them in business in the first place.

So in general, LLMs are the wrong tool for information retrieval.

Yet they can be very helpful under the right circumstances. Sometimes you know what you're looking for, but you don't know what it's called or what keywords to use. An LLM can help you out. Describe what you're looking for in plain language, and ask what term to use in a search query.

Here's an example of how this approach helped us find a mathematical concept we vaguely remembered from college. Once we learned that the key term was "power mean", it was easy to find a detailed and trustworthy explanation online.

The arithmetic mean, geometric mean, and harmonic means are each special cases of a more general family of measures indexed by a single parameter. Tell me about that family, the parameter in question, and the associated mathematics ChatGPT said: ChatGPT  The arithmetic mean, geometric mean, and harmonic mean are special cases of the power mean, also known as the generalized mean. The power mean is a family of averages indexed by a parameter pp, which determines the type of mean calculated. The associated mathematics is as follows: Definition of the Power Mean  The power mean of a set of non-negative real numbers x1,x2,…,xnx1​,x2​,…,xn​ for p∈Rp∈R is defined as: Mp(x1,x2,…,xn)=(1n∑i=1nxip)1/p,for p≠0, Mp​(x1​,x2​,…,xn​)=(n1​i=1∑n​xip​)1/p,for p=0,  and for p=0p=0, the mean is defined as the geometric mean: M0(x1,x2,…,xn)=(∏i=1nxi)1/n. M0​(x1​,x2​,…,xn​)=(i=1∏n​xi​)1/n.

ChatGPT 4.o query 11/14/2024

ChatGPT 4.o query 11/14/2024

PRINCIPLE
LLMs generate plausible text; they are not information retrieval systems. Not only do LLMs sometimes fabricate incorrect answers, they also obscure the information sourcing—the blue links—that are part and parcel of traditional search.

DISCUSSION
Imagine that in 20 years, we rely on AI for all of our internet search needs and blue links to primary sources are just a distant memory. What, if anything, will we be missing?

Photo: Carl Bergstrom

Photo: Carl Bergstrom

VIDEO

Coming Soon.