LESSON 5

Hard to understand, harder to fix

Developed by Carl T. Bergstrom and Jevin D. West

Photo: Carl Bergstrom

LLMs are not traditional computer programs.

Their capacities aren’t explicitly encoded in step-by-step instructions. Rather, they are a form of machine learning system that exhibits complex emergent behaviors when trained on mind-bogglingly large data sets.

What is machine learning? Answering that question in detail would require an entire course. But think about it this way:

When you develop a traditional computer program, you plan out in advance what you want it to do. Then you write computer code to do precisely that.

A traditional program of this sort takes data as an input, operates on that data, and finally generates output.

Bergstrom and West (2020) Calling Bullshit

Bergstrom and West (2020) Calling Bullshit

Machine learning works differently. For example, suppose you want a program that can distinguish images of cats from images of dogs. You don't explicitly tell it what to look for to draw the distinction. Instead, you collect a large number of images, labeled "cat" or "dog". These are your training data. You feed that data into a learning algorithm, and that learning algorithm writes your program for you. Finally, you can put new, unlabeled images—the test data—into the program. If all goes well, it will be able to correctly identify each.

This approach is very powerful, but it is often difficult to understand exactly what the resulting programs are doing or precisely how they work.

Large language models are created using a machine learning approach. Instead of predicting whether an image is a cat or a dog, they predict what words are likely to come next in a string of text. The training data are massive amounts of text scraped from the internet.

Unlike in our cats and dogs example, labels are unnecessary because the aim is just to predict the next word from the previous words. The machine simply reads up to some point, challenges itself to predict what comes next, checks to see if it was right, and iterates.

The result is a gigantic model where the computer learns how to set approximately a trillion parameters. And somehow, that gives us a machine that can...

Translate Old English into modern Italian...

ChatGPT 4.o query 12/7/2024

ChatGPT 4.o query 12/7/2024

Debug FORTRAN code...

Hold a passable conversation in the Klingon language...

ChatGPT query 12/7/2024

ChatGPT query 12/7/2024

Convert data from one format to another...

...summarize meeting notes, and much more.

With a trillion learned parameters interacting in ways that no human planned or designed, it is difficult to understand how these machines work.

It’s even more difficult to fix things when they fail.

LLMs fabricate, or “hallucinate”, prolifically—but we have little ability to explain why they do so, let alone how they settle upon the specific falsehoods they generate.

Eliminating these behaviors may be outright impossible.

Because we lack an adequate model of how LLMs do what they do, we cannot debug them the same way we debug a conventional program.

We can only play Whac-A-Mole in trying to fix their flaws with after-the-fact patches layered atop the existing architecture.

Why does that matter?

Tech leaders promise that these systems will get better over time. But it will be very hard to get LLMs to stop doing some of the things that make their use most problematic.