
Setting aside for a moment the many other dubious features of this viral New York Times quiz about AI vs. human writers, which don’t really bear thinking about, it’s interesting that pretty much all of the attempts to prove GenAI’s ability to make works of narrative art or literature focus on short, vibe-based excerpts, using models trained to imitate specific human professionals. My question: Why is that?
There’s a 2025 academic study I read a few months ago1 that performed a version of this exact test, more or less, i.e. prompting both ChatGPT and “expert” human writers to produce short paragraph-length texts (up to 450 words) emulating famous authors (Junot Diaz, Han Kang, Margaret Atwood, etc.), then asking MFA students and lay readers to evaluate the samples. Both lay readers and MFA students strongly preferred humans to the untrained imitative AI writing samples, but once the AI was “fine-tuned” on the complete works of Diaz et. al., both groups preferred the imitative AI texts — again, for writing samples of 450 words or less.2 Amusingly, the authors’ conclusions here are mostly labor-related: “While we do not account for additional costs of human effort required to transform raw AI output into cohesive, publishable novel length prose, the median fine-tuning and inference cost of $81 per author represents a dramatic 99.7% reduction compared to typical professional writer compensation.” Great.
Again, let’s set aside some of the obvious criticisms. Let’s set aside what it means in this case for one orphaned paragraph to be better than another, or whether clarity and simple rhetorical devices (e.g. anaphora) are perhaps more highly valued by readers of a free-floating decontextualized paragraph than they would be by readers of a paragraph that forms part of a longer work. Why are we limiting ourselves to evaluating paragraphs at all? I don’t go to the bookstore to buy a paragraph of Han Kang or Cormac McCarthy. The relevant question is: Would I prefer a book written by an AI Cormac McCarthy to one written by the author himself?
But I’d prefer to make the challenge much easier. It isn’t really fair to judge all of us by the standard of Blood Meridian (which title the presumably human author of the Times quiz originally misspelled as Meridien). Can GenAI actually produce, without a human actively guiding the narrative, a bog-standard novel, or even a short story, that is enjoyable? Or even, on a narrative level, legible?
Let’s also forget about highbrow literature. What about a police procedural or detective novel? Can a GenAI writer track multiple characters across multiple scenes or produce several pages in a row of coherent dialogue? Can it narrativize events that properly track a character’s internal motivations, even in a rudimentary, childlike way? Could it, if prompted, invent a plot twist that makes logical sense, one that is — not even necessarily satisfying — but simply coherent? In short, can GenAI actually assemble a narrative, even badly, or string many paragraphs together as part of a consciously narrative project that doesn’t fall apart upon examination?
I find it telling that the second of the two studies mentioned in the Times quiz argues for readers’ preference for AI poetry, a genre of writing that, as it is practiced (for better or worse) by the usual gatekeepers, simply has a higher barrier to appreciation and a different relationship to pleasure: contemporary poetry may resist immediate interpretation or assume facility with specific literary traditions. It’s a genre that can (maybe tends to) reward difficulty, for reasons beyond the scope of this squib, and there’s also a folk/doggerel/Tumblr subgenre of poetry that has always enjoyed greater mass appeal and that for obvious reasons is highly imitable. More to my own point, poetry is not bound to narrative coherence in the way most fiction and nonfiction writing is, and poems can also be really short.
I didn't have any trouble selecting the human-made paragraphs in the Times quiz, often from the first two sentences alone. (Except for the Carl Sagan excerpt, which was tricky because of the GPT-like use of anaphora; I got that one wrong.) For the time being, I think, AI writing continues to show its hand. But I also didn’t find any of the paragraphs, human or AI, particularly good, or at least I didn’t feel I could judge them. (I’m not sure I would have found poems so easy.) Although I picked the human text four times out of five, I don’t think that tells me anything meaningful about my preferences as a reader of narrative writing.
For me it just raises the question of whether GenAI creative writing can do more than reproduce vibes. We know it can write college essays and produce confident (if dull) argumentative essays more generally. But can it produce a legible narrative text on its own, even something structurally and linguistically simple like, I don’t know, a Raymond Carver pastiche, start to finish? Most arguments in favor of AI’s writing abilities, whether utopian or apocalyptic, seem to sidestep these narrative questions and focus on how the writing comes across on the level of the sentence. I’m starting to think they’re all full of it.
Of course not many people read books anymore, so maybe the whole subject is moot. But everyone likes movies, and intuitively understands how they work at the level of grammar and syntax. I notice the same slight-of-hand with X.com boosters who post all of those AI-generated film clips. Like this one:
This is not a film clip, of course, since it doesn’t resemble anything you’d ever see in a movie. Its grammar sort of imitates a movie trailer, which is to say a movie vibe, one that is composed entirely of repetitive <1-second shots. As a “trailer” it doesn’t really make any sense, and is obviously derivative of its source material in ways that go beyond a prompt to imitate a director’s style (see images below), but I’m not asking about aesthetic qualities. Could AI produce an actual 30-second movie scene that makes even rudimentary sense? Could it follow the basic syntactic rules of film editing in order to convey a narrative event? Could it produce a single shot that lasts, say, 10 seconds, and that features characters interacting with a world? One that isn’t incoherent or uncannily inhuman?


I’m not asking rhetorically. I don’t actually know; I haven’t played around with GenAI enough to know. I’m curious if anyone does. But from what I’ve read it’s not at all clear to me that GenAI can do these things, and the predictive token-generating nature of these models suggests to me that this would be much more challenging than briefly imitating certain elements of a style of writing or film, of communicating a vibe, which is pretty much what the ancestors of GenAI chatbots from ELIZA forward were designed to do.
If the answer all to these questions is no, what are we doing here?
//
I’m also remembering Jay Caspian Kang’s fun article on using GPT-3 to fix his novel. He’s working with an earlier model of ChatGPT but I think most of the general impressions, good and bad, remain valid: ChatGPT’s parlor tricks are impressive, and its limitations fatally undercut the promises made about its powers. But I’m not convinced it’s true that, as Kang writes (in 2022!), “it seems, at least for now, that GPT-3 can generate its own stories,” except under close and intensive human management. Unless I’m missing something, ChatGPT can’t generate — and then populate dramatically — a narrative plot except as iterative piecemeal responses to human input, in sort of a Choose Your Own Adventure mode. For all its ingestion of the world’s creative material, it’s not clear to me it could produce, on its own, “the generic average of a murder mystery,” as a machine learning expert suggests at the end of the article. Would the clues actually line up with the deduction process, and would the murder make any sense? I could be wrong, but I doubt it.
Yet discussions I read about GenAI, and this Times quiz, seem to take it as a given that GenAI can do these things. Which is crazy, because those skills are integral to what most “creative writing” is — not vibes, not rhetorical flourishes, not arguments, but scene, summary, story.3 I know that writers are pumping out romance novels and the like using ChatGPT, but my assumption is that they’re doing roughly what Kang did in his article, managing its output through a series of prompts and corrections, effectively directing a ghost writer that cannot think narratively on its own for more than, like, a few paragraphs. Admittedly, I haven’t read everything on the subject because I find it very depressing. Is there more to it?
My suspicion, mostly unfounded, is that inventing and communicating a narrative of significant duration and complexity requires modes of thinking that are totally alien to current GenAI models. Just spitballing, maybe it requires some level of understanding of the existence of embodied consciousnesses, the persistence of identity across time, the mutability of identity across time, the experience of time, and of course agency, which GenAI is notoriously bad at. Or something less than understanding, if you don’t like the term with respect to AI. It’s hard to imagine it’s a limitation of memory, but in my experience LLMs do not hold onto narrative threads well. They also can’t seem to initiate narrative events on their own, which is why those AI “movies” so often feel like nightmares where a character wants to perform an action but is somehow paralyzed.
I have many moral, humanistic, and aesthetic qualms with using GenAI for most of the tasks that fill my day, and like Kang I’m skeptical that it will ever produce creative work that thinking people will want to spend time with. I’m willing to be proven wrong, and I admit that the technology has use cases, even elements that can seem, on first blush, magical. And I don’t think writing books or making movies are, in a literal sense, mystical activities that require a human soul. Just as poetic meter and rhyme scheme are patterns that GenAI can reproduce in doggerel verses, plots and dialogue are fundamentally based on patterns — albeit much more complicated ones. From what I can see, they lie beyond the abilities of GenAI, which cannot tell a story without a human at the wheel. Prove me wrong!
I now see it’s actually one of two studies linked in the Times quiz. Man, I don’t know, I find it close to journalistic malpractice to frame this study as suggesting “many readers prefer A.I.-generated writing to human-authored works.” We’re not talking about works under any definition of the term. In the described study we’re talking about free-floating, context-free 450-word-max paragraphs trained on specific authors’ entire corpuses and ordered to imitate them, without which training human writing is overwhelmingly preferred to AI texts by MFA grads and lay readers alike.
I got a couple small details wrong in my original post, which I’ve corrected.
I’m eliding the question of “style” and its relationship to “content” in exactly the way Susan Sontag famously criticizes, but for what it’s worth I’m equally skeptical that a GenAI could maintain what we would generally understand as a coherent style, even one that is merely imitative, for the length of a novel. But then many novelists can’t either.



I've just started teaching a new course about kicking back against social media and AI by developing a creative writing practice and am noticing that the texts I want to share are utterly beyond Gen AI's capacity. They mix personal experience with encounters and reflections on an idiosyncratic haul of texts, artworks and events. The connections between these can't be bullet-pointed. Am wondering if LLM culture will trigger a counter-culture of these types of works in the future.
Thank you for elucidating the question that pings my mind every time this subject comes up. Clap, clap.
I want to barnacle onto this, though: "But can it produce a legible narrative text on its own, even something structurally and linguistically simple like, I don’t know, a Raymond Carver pastiche, start to finish?"
I guess it depends on what you mean by legibility. Carver's stories (i.e. "One More Thing") ask the reader to draw a lot of inferences from subtle cues, to read what's happening off the page by decoding info-dense sentences, and kind of 'carry the one' over and over again to reach a cumulative emotional effect. It's a level of thinking that escapes a lot of casual readers -- because it requires thinking! -- and strikes me as impossible for the type of AI writing I've seen, which, as you say, struggles with coherence past the vibe level. What's Carver without its Carver-ness?
I'm assuming you mean structurally and linguistically legible. But the idea of Carver-ness pings a thought I've had about style in the age of AI. My intuition says that, after formulaic genre fiction, stories with deep, gushing interiority would be easier to reproduce: the prose is showing its work, showing its thinking, along the way. For coherence, the reader does not have to think as much as accept. None of this disagrees with your point: AI still struggles, like most writers do, with juggling character, theme, story, scene. Carver, to me, just adds one more baton to juggle.