Don’t Be Misled by GPT-4’s Gift of Gab

Don’t Be Misled by GPT-4’s Gift of Gab

This is an edition of The Atlantic Daily, a newsletter that guides you through the biggest stories of the day, helps you discover new ideas, and recommends the best in culture. Sign up for it here.

Yesterday, not four months after unveiling the text-generating AI ChatGPT, OpenAI launched its latest marvel of machine learning: GPT-4. The new large-language model (LLM) aces select standardized tests, works across languages, and can even detect the contents of images. But is GPT-4 smart?

First, here are three new stories from The Atlantic:

A Chatty Child

Before I get into OpenAI’s new robot wonder, a quick personal story.

As a high-school student studying for my college-entrance exams roughly two decades ago, I absorbed a bit of trivia from my test-prep CD-ROM: Standardized tests such as the SAT and ACT don’t measure how smart you are, or even what you know. Instead, they are designed to gauge your performance on a specific set of tasks—that is, on the exams themselves. In other words, as I gleaned from the nice people at Kaplan, they are tests to test how you test.

I share this anecdote not only because, as has been widely reported, GPT-4 scored better than 90 percent of test takers on a simulated bar exam, and got a 710 out of 800 on the reading and writing section of the SAT. Rather, it provides an example of how one’s mastery of certain categories of tasks can easily be mistaken for broader skill command or competence. This misconception worked out well for teenage me, a mediocre student who nonetheless conned her way into a respectable university on the merits of a few crams.

But just as tests are unreliable indicators of scholastic aptitude, GPT-4’s facility with words and syntax doesn’t necessarily amount to intelligence—simply, to a capacity for reasoning and analytic thought. What it does reveal is how difficult it can be for humans to tell the difference.

“Even as LLMs are great at producing boilerplate copy, many critics say they fundamentally don’t and perhaps cannot understand the world,” my colleague Matteo Wong wrote yesterday. “They are something like autocomplete on PCP, a drug that gives users a false sense of invincibility and heightened capacities for delusion.”

How false is that sense of invincibility, you might ask? Quite, as even OpenAI will admit.

“Great care should be taken when using language model outputs, particularly in high-stakes contexts,” OpenAI representatives cautioned yesterday in a blog post announcing GPT-4’s arrival.

Although the new model has such facility with language that, as the writer Stephen Marche noted yesterday in The Atlantic, it can generate text that’s virtually indistinguishable from that of a human professional, its user-prompted bloviations aren’t necessarily deep—let alone true. Like other large-language models before it, GPT-4 “‘hallucinates’ facts and makes reasoning errors,” according to OpenAI’s blog post. Predictive text generators come up with things to say based on the likelihood that a given combination of word patterns would come together in relation to a user’s prompt, not as the result of a process of thought.

My partner recently came up with a canny euphemism for what this means in practice: AI has learned the gift of gab. And it is very difficult not to be seduced by such seemingly extemporaneous bursts of articulate, syntactically sound conversation, regardless of their source (to say nothing of their factual accuracy). We’ve all been dazzled at some point or another by a precocious and chatty toddler, or momentarily swayed by the bloated assertiveness of business-dude-speak.

There is a degree to which most, if not all, of us instinctively conflate rhetorical confidence—a way with words—with comprehensive smarts. As Matteo writes,“That belief underpinned Alan Turing’s famous imitation game, now known as the Turing Test, which judged computer intelligence by how ‘human’ its textual output read.”

But, as anyone who’s ever bullshitted a college essay or listened to a random sampling of TED Talks can surely attest, speaking is not the same as thinking. The ability to distinguish between the two is important, especially as the LLM revolution gathers speed.

It’s also worth remembering that the internet is a strange and often sinister place, and its darkest crevasses contain some of the raw material that’s training GPT-4 and similar AI tools. As Matteo detailed yesterday:

Microsoft’s original chatbot, named Tay and released in 2016, became misogynistic and racist, and was quickly discontinued. Last year, Meta’s BlenderBot AI rehashed anti-Semitic conspiracies, and soon after that, the company’s Galactica—a model intended to assist in writing scientific papers—was found to be prejudiced and prone to inventing information (Meta took it down within three days). GPT-2 displayed bias against women, queer people, and other demographic groups; GPT-3 said racist and sexist things; and ChatGPT was accused of making similarly toxic comments. OpenAI tried and failed to fix the problem each time. New Bing, which runs a version of GPT-4, has written its own share of disturbing and offensive text—teaching children ethnic slurs, promoting Nazi slogans, inventing scientific theories.

The latest in LLM tech is certainly clever, if debatably smart. What’s becoming clear is that those of us who opt to use these programs will need to be both.


Today’s News
  1. A federal judge in Texas heard a case that challenges the U.S. government’s approval of one of the drugs used for medication abortions.
  2. Credit Suisse’s stock price fell to a record low, prompting the Swiss National Bank to pledge financial support if necessary.
  3. General Mark Milley, the chair of the Joint Chiefs of Staff, said that the crash of a U.S. drone over the Black Sea resulted from a recent increase in “aggressive actions” by Russia.


Explore all of our newsletters here.

Evening Read
Nora Ephron GIF
Arsh Raziuddin / The Atlantic

Nora Ephron’s Revenge

By Sophie Gilbert

In the 40 years since Heartburn was published, there have been two distinct ways to read it. Nora Ephron’s 1983 novel is narrated by a food writer, Rachel Samstat, who discovers that her esteemed journalist husband is having an affair with Thelma Rice, “a fairly tall person with a neck as long as an arm and a nose as long as a thumb and you should see her legs, never mind her feet, which are sort of splayed.” Taken at face value, the book is a triumphant satire—of love; of Washington, D.C.; of therapy; of pompous columnists; of the kind of men who consider themselves exemplary partners but who leave their wives, seven months pregnant and with a toddler in tow, to navigate an airport while they idly buy magazines. (Putting aside infidelity for a moment, that was the part where I personally believed that Rachel’s marriage was past saving.)

Unfortunately, the people being satirized had some objections, which leads us to the second way to read Heartburn: as historical fact distorted through a vengeful lens, all the more salient for its smudges. Ephron, like Rachel, had indeed been married to a high-profile Washington journalist, the Watergate reporter Carl Bernstein. Bernstein, like Rachel’s husband—whom Ephron named Mark Feldman in what many guessed was an allusion to the real identity of Deep Throat—had indeed had an affair with a tall person (and a future Labour peer), Margaret Jay. Ephron, like Rachel, was heavily pregnant when she discovered the affair. And yet, in writing about what had happened to her, Ephron was cast as the villain by a media ecosystem outraged that someone dared to spill the secrets of its own, even as it dug up everyone else’s.

Read the full article.

More From The Atlantic

Culture Break
Ted Lasso
Colin Hutton / Apple TV+

Read. Bootstrapped, by Alissa Quart, challenges our nation’s obsession with self-reliance.

Watch. The first episode of Ted Lasso’s third season, on AppleTV+.

Play our daily crossword.


“Everyone pretends. And everything is more than we can ever see of it.” Thus concludes the Atlantic contributor Ian Bogost’s 2012 meditation on the enduring legacy of the late British computer scientist Alan Turing. Ian’s story on Turing’s indomitable footprint is well worth revisiting this week.

— Kelli

Isabel Fattal contributed to this newsletter.

Back to blog