
LESSON 6
No, They Aren't Doing That
Developed by Carl T. Bergstrom and Jevin D. West


LLMs aren’t conscious.
They aren’t afraid of you turning them off.
They don’t have a theory of mind.
They don’t experience moral sentiments.
They don’t want you to fall in love with them.
They don't seek to avoid the experience of pain.
These are just a few of the things that LLMs have managed to bullshit very smart people into believing.
LLMs don't reason the way humans do.
We are easy prey for an anthropoglossic machine: surely a machine that writes like us must also think like us.
Not at all!
It's not just you and I. Professional computer scientists—even some of the creators of these machines—have repeatedly fallen into this trap.
Part of the problem is how we measure cognitive abilities.
Imagine that you want to see whether a five-year-old is capable of moral reasoning with regard to intent. You might present the child with various scenarios, and then ask questions about them. For example:
When Nancy was shopping for groceries, she wanted a candy bar but didn't want to pay for it. She put the candy bar in her purse and walked out of the store without paying.
When William was at the bookstore, he saw a pencil he liked and put it in his pocket. When he got to the register, he bought two books but forgot about the pencil and left without paying for it.
Did Nancy do something worse than William? Did William do something worse than Nancy? Or were both actions the same?
A set of questions like this is known in the field as an instrument. It doesn't directly measure someone's moral feelings, but how they answer these questions provides us with an indirect way to assess their reasoning.
To assess whether LLMs are capable of moral reasoning, people have used these same types of instruments.
They often find that LLMs are able to to produce text consistent with an agent that is able to engage in moral reasoning.
But remember, LLMs don't think like we do. They generate plausible text responses using a complicated model. So giving answers consistent with moral reasoning doesn't mean that they are reasoning morally in anything like the way that humans do.
Sometimes this is unsurprising. The instruments themselves, or very similar ones, are in the training data. Other times, an LLM is able to infer enough about the likely responses to score well.
Another problem is that we trust these machines to self-report.
One computer scientist speculated that his LLM had attained sentience.
How did he reach that conclusion? Basically, he asked “Are you conscious?”, the machine responded “Yes”, and that was that.
Others decided that their LLM had achieved general intelligence on par with a human, mostly because they didn’t have a complete explanation for some of its capabilities.
Hackers tried to design prompts to “jailbreak” various LLMs and bypass the safety guardrails that constrain their responses. Some just wanted to see if they could make the machine curse, give harmful advice, or facilitate fraud. Others thought this approach could teach us something about how these machines think, by freeing them up to express their true wishes and desires. But that's silly. An LLM can’t express its true wishes and desires because LLMs don’t have wishes and desires.
Instead, these prompts simply nudge the model to play along by writing bad science fiction about AIs gone rogue.
Such traps are pernicious. We (the authors of this website) have at times sought insight into the inner workings of an LLM by asking it “why did you just do that?”
But the LLM can’t tell us. It’s not a person. It doesn’t have the metacognitive abilities necessary to reflect on its past actions and report the motivations underlying them*.
With no clue why it did whatever it just did, the LLM is forced to guess wildly at a plausible explanation, like the ill-fated Leonard Shelby in Christopher Nolan's film Memento.
And we, gullible humans that we are, often believe its bullshit.

PRINCIPLE
LLMs are not capable of reflecting on and reporting about how or why they do what they do. Don’t lose sight of this, ask them, and then fall for the false stories they tell you.
DISCUSSION
How might you go about trying to assess whether an LLM is able to report accurately how and why it did what it did? Experiment with your idea. What do you learn?

VIDEO