Language models can only write ransom notes

Allison Parrish

Posted: 2024-02-26

This is the text of a talk I gave to NYU’s Digital Theory Lab in February 2024.

Ripped from the headlines

Let’s start with the notion that the output of large language models (LLMs) is a kind of collage. Think of it like this. The first step of training a language model is tokenization, in which the documents in the training data are broken up into tokens. These tokens might consist of individual characters, or entire words, or “byte pairs” or any number of other ways that a digital text can be portioned up. The code that generates text from the model “samples” the token type that is most likely to come next, given some context, and sticks it to the end of the output. In theory, any individual token in the output can be traced back to the source document (or documents) in which that token occurs. The output, then, is a collage: tokens from various source documents, stuck together.

A ransom note. Source: Sheila Sund from Salem, United States, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

It’s true that the appearance of any individual token in LLM output might be occasioned by statistical properties derived from that token’s appearance in thousands (or millions) of documents. Still, in my mind’s eye, I like to imagine that I can see each token typeset in the style of its sources, with (metaphorical) tear marks from where they’ve been torn from the (virtual) page. The result resembles a classic Hollywood ransom note, in which a kidnapper creates a collage from physical typeset “tokens” (letters, bigrams, trigrams, words, phrases…) torn from newspapers and magazines, which the kidnapper arranges to spell out a message instructing interested parties on the conditions for their hostage’s release.

Something is where it wasn’t

What distinguishes the ransom note from other forms of collage is the intention underlying its use. Dadaists and Cubists used collage in order to “collapse… the notion of materiality with reality, with stuff as being rather than representing(Drucker 85)—a kind of sympathetic magic. Douglas Kearney (who I quote more extensively below) uses collage in his visual poetry as part of a “long-term, sometimes ironic investigation of the condition of having context… through forced recontextualization” (Kearney 32). Creators of ransom notes, however, use collage as a way to “erase indexical marks of the self: smudges on linen, fingerprints, and handwriting” that might help investigators identify them (Stern 130). The ransom note writer “de-authorizes” the cut-and-pasted words by removing them “from their privately sanctioned, publicly disseminated context” (Stern 95), ironically rendering the text’s material form (typeface, size, letterform, color, paper stock) irrelevant, in an attempt to achieve uniformity and anonymity.

I would argue that LLM output works the same way: each token, taken from its source, is “de-authorized,” its material form and original context rendered irrelevant, and subsequently used to create the illusion of a uniform voice. The people responsible for tokens coming to be arranged in this way—the engineers, the product designers, the marketing departments, the CEOs of “artificial intelligence” companies—are rendered effectively anonymous. LLM output is, however, a curious variety of ransom note, in which the words are cut-and-pasted from the very thing that the LLM makers are holding hostage: our own writing, as scraped from the world wide web over decades.

In fact, I would propose that all automated text composition is a form of collage—not just language models (whether small, medium or large) but also, say, context-free grammar generators and Mad Lib-style mail merges. In each case, units of text are taken from some context (even if that context is a list in a database) and inserted into a new context, in order to produce a new meaning. In the words of Douglas Kearney, “something is where it wasn’t and is doing some new work where it is now” (Kearney 18).

That means that, as a computational poet, I am also a collagist. But I don’t want to make ransom notes. I would prefer to go about my collaging activities without behaving as a kidnapper might behave. I would prefer for my poems to not also be threats.

So I’m left with the task of setting forth some criteria that distinguish the kind of computational collage I want to make from the ransom notes that come out of LLMs. I wish there were an easy technical distinction between these two methods, but I don’t think there is: a Markov chain produces collage just as much as LLMs do; the difference boils down to a matter of scale. Additionally, I don’t want to say that LLM outputs are bad, in either the sense of “evil” or “insufficient.” I am but a humble poet and will leave such declarations to the critics (e.g., McQuillan) and moral philosophers (e.g., Golumbia). In this essay I try to articulate the squishier criteria that remain to me: those based on on aesthetics, ethics and intent.

Recognizing the cut

In his collection of talks and essays Optic Subwoof, Douglas Kearney discusses at length the phenomenology of collage, particularly as it pertains to his own concrete poetry practice. Fitting to the ransom note metaphor above, Kearney offers the following initial proposition on collage:

Humanity’s first encounter with collage was a murder scene. […] The decontextualization (red from the inside) forced into a new context (the outside), the standing context (a walk on the veldt’s nigh monochrome) transformed as well. Nothing the same (Kearney 13–14).

“Disruption is a fact of collage,” Kearney writes, but then goes on to propose a number of conditions—beyond mere disruption—that an artifact must meet in order to produce a collage effect:

[T]he experience of collage requires that one recognizes two things from different contexts recontextualized into a new, single one…. [T]he hand behind the new context doesn’t get to keep its cutting hush-hush. […] You must recognize the evidence of the decontextualization that made this new context. You must notice the cut (Kearney 17–18, original emphasis)

In fact, it’s the visibility of “the cut” that enables the social relations and poetics that draw Kearney to collage in the first place. Kearney compares his collage poems to “the sonic collage productions of DJ Premier, Madlib, or J Dilla… [who] use samples to make music,” whose works “play on the ability to recognize the cut” in order to create an “intertextual and intertextural music” (Kearney 29).

The word “recognize” here is used in its double meaning: identify and show respect toward. When collage makes it possible to “recognize the cut,” then, collage nourishes social relations between collagists, audiences, and creators of the sampled media. The result is a composition whose “textuality is not centralized… It’s communal and relational” (Kearney 35). Making the cut visible is a way of subverting authorship while also attending to those that you’re in discourse with. A collage’s authorship is decentralized, but not uniform.

As I argued earlier, maintaining the anonymity of “the hand behind… the cut” is precisely the aim of both large language models and ransom notes. Large language models are designed with the intention to obscure the cut between tokens; likewise, ransom notes specifically draw attention away from the material-level cut and toward a reading of the surface text. Both of these forms of collage work against the affordances of collage: they have “decentralized” authorship, but with the seams of the cut smoothed over, the result is a text that is neither “intertextual” nor “intertextural.”

We merely toast the bread

Kearney explains the word “intertextural” with another analogy to music made with samples. Musicians that make use of sampling “are after text and texture… that will combine to make a single text.” The sampled materials

must… have different textures because the sources were recorded using different instruments, with different players, on different mics, in different rooms, using different mixes, made by different engineers, stored on often different media…. It isn’t just about a bass line. It’s about all of those differences juxtaposed with other differences and how they hold together or fall apart. (Kearney 29–30)

The texture of the sampled phrase, in other words, results from the material history that brought the phrase into being—the performers, the instruments, the rooms, the microphones, and so forth. This texture inheres in the sample even when it has been digitized—the digitization process is only another step in the phrase’s material history. (Kearney made his visual poems with a photoediting software, after all.)

Likewise, I would argue that the texture of a stretch of language results from its own material history: the fingers that keyed in the letters, hands that scanned the book, the typesetters whose fingertips on the trackpad set the margins and whose roving eyes found rivers to correct, the vocal tracts that shaped the air that became the words that were transcribed, possibly by another human, or possibly by a speech-to-text model which is itself trained on—and therefore contains within it—the keystrokes and voices of many others. This is as true for LLM outputs as it is for any other form of collage. Every time a token’s occurrence in the corpus increments an integer or nudges a floating-point number in a neural network layer, that nudge or bump resulted at some point from human action. In the case of computational collage, we can add to that material history the hands that wrote the code, the hands of those who built the data center where the code runs, the hands that mined the rare metals to build those computers, and so forth. All of this is available to us as texture in the cut.

Kearney’s discussion of texture brings to mind Anni Albers’ remarks on the sense of touch as it relates to textile materials. She argues that our faculty for perception by touch is degenerating in the modern era, as

[o]ur materials come to us already ground and chipped and crushed and powdered and mixed and sliced, so that only the finale in the long sequence of operations from matter to product is left to us: we merely toast the bread. No need to get our hands into the dough. No need—alas, also little chance—to handle materials, to test their consistency, their density, their lightness, their smoothness (Albers et al. 44).

She goes on: “We touch things to assure ourselves of reality. We touch the objects of our love” (Albers et al. 44). In other words, we touch things to recognize them (in both senses of the word).

Now I would argue that even digital text has tactility and can be touched: think of the feel of the home row nubs under your fingers on a computer keyboard, or the smoothness of the glass that you swipe to make digital text scroll by on your phone. But the analogy I’m trying to draw here is between the tactility of materials and the material history of text. An intertextural collage allows us to recognize (identify, respect, love, touch) that materiality. The large language model, on the other hand, is designed to prevent exactly this, and in this analogy is more similar to Albers’ “modern industry” that “bars us from taking part in the forming of material” (Albers et al. 44). No wonder then that LLM outputs feel “ground and chipped and crushed and powdered and mixed and sliced.”

Hoards, collections, bodies

The distinguishing feature of collage as a composition method is that collages are made from pre-existing materials. In their recent book Tone, Sofia Samatar and Kate Zambreno contrast two techniques for bringing materials together for collage: the collection and the hoard. The distinction between the two, they explain, is that

[i]tems in a collection are lovingly arranged and displayed, each with its own particular place and meaning…. These items have been curated; they receive care…. The hoard, by contrast, is expansive and composed of things that have lost their value, things made with the purpose of losing their value… (Samatar and Zambreno 45–46)

Samatar and Zambreno also distinguish the hoard from the collection with the criterion of distance. “Distance,” they write, “the ability to stand outside and take overview, is essential to collecting.” In the hoard, however, “there is no distance… one is enveloped in the hoard and is hoarded by it” (Samatar and Zambreno 50–51). The distinction between collection and hoard is both affective and relational: how the collagist feels about the material, and how they orient themselves toward it.

The source material for a computational text collage is commonly called its “corpus” (Latin for “body,” and cognate with English “corpse”—there’s our murder scene again). I think it’s possible to broadly classify any computational corpus as either a collection or a hoard, following Samatar and Zambreno’s distinctions outlined above. The corpora of large language models certainly qualify as hoards. Scale, rather than curation, is the primary goal when creating such corpora, and though the corpora can be bought, sold, and licensed (Pooley), creators of these corpora believe that the content inside has no value worthy of remuneration (Davis).

It’s not that there is no “distance” in the LLM corpus, but that the question of distance does not arise, as the contents of the corpus are unread, unknown, unattributed, and often undisclosed. These corpora are, in Bender et al.’s words, “unfathomable” (Bender et al.). Samatar and Zambreno write that “The hoard is never full. It is a surfeit that alchemizes instantly into lack” (Samatar and Zambreno 45) and indeed LLM engineers are continually in search of larger and larger corpora to feed their models. One recent corporate LLM was trained on “trillions of tokens” (Touvron et al.). Many of those tokens were likely produced by previous versions of the same models, whose output is now on the public web (Shumailov et al.) (“one is enveloped in the hoard and hoarded by it”).

Melancholy and mourning

Samatar and Zambreno argue that distance—whether physical, temporal, or psychic—is what enables a collection to create “a space of meaning opposed to the brutal transformation of labor into exchange value” (Samatar and Zambreno 51). But distance’s consequence, and meaning’s price, is affective entanglement with what is collected. “Distance makes melancholy,” they write, and

Perhaps, we have wondered, a collage is always melancholy, after Freud’s definition, of that which cannot be let go…. Perhaps the tone of any private collection is necessarily depressive, reparative, mournful, comforting, blue. (Samatar and Zambreno 32)

In the context of computational text collage, I propose that “distance” emerges when the collagist acknowledges the material histories of their corpora and the collagist’s relationship with them—including the other human beings that brought these corpora into existence. Those others may be friends, mentors, ancestors, one’s earlier self, neighbors, or even perfect strangers. Regardless, the melancholy and the meaning of the collage arise only through the acknowledgment of the other’s absence.

Creators of large language models are very eager to conceal this distance. They do so by flattening the materiality of their corpora, thereby effectively severing the text from its own history and rendering uniform what had been equivocal—like bulldozing a graveyard. Yet the distance and the melancholy persist, despite this attempt at hiding it away. When I’m writing with a large language model, I am all too aware of the ghosts and strangers whose voices I’m speaking with. The keyboard beneath my fingers hums with frustrated mourning.

Let us love the distance

A New York Times article from 1957 relates the story of the purported kidnapping of a former actress, Marie McDonald. The police investigating the case determined that the kidnapping had been a hoax undertaken by McDonald herself:

Investigating officers’ doubts of an actual abduction were sharpened by the discovery that a purported kidnap note found in the actress’ mail box had been composed partly of headlines cut from a newspaper later discovered in her fireplace. (Stern 137)

This story hints at how “de-authorized” snippets of text, as found in a ransom note, still ultimately retain their material history, and that material history in this case (i.e., having been cut from the newspaper in the fireplace) was enough to expose McDonald’s hoax. Likewise, with enough effort, we can also systematically expose the material history of generative AI outputs—see, e.g., Carlini et al. and Somepalli et al. We can know the distance between us and those whose voices we speak with when writing with LLMs, though LLM creators intentionally make this difficult.

I’m reminded of Simone Weil’s words to her friend Gustave Thibon:

Soon there will be distance between us. Let us love this distance which is wholly woven of friendship, for those who do not love each other are not separated… (Thibon 118)

Let us love this distance; we touch the objects of our love. Computational text collage is suited to many forms and emotions—form letters, satire, poetic juxtaposition, avant-garde linguistic exploration—but I’d like to believe love is among them. More broadly, I would claim that the act of making a collage always has stakes that are interpersonal, historical, and contextual. In my own work, I prefer to face those stakes head-on, by using only corpora whose relation to me I can know and understand, rather than the “unfathomable,” coercively dematerialized corpora of large language models. In my work, I want to make the distance manifest, rather than hide it away.

The reason I make poems is that I am trying to find new arrangements of language that—in the words of Jackson Mac Low—“bring about new kinds of pleasure” (Mac Low and Tardos, xxxii). I use computation to do so because I think it’s a useful tool for bringing about these new arrangements. Following Kearney’s lead, I make my collage “cuts” visible—not just to ensure that my work is intertextual and intertextural, but also to foreground the computational methods that I use to make cuts in the first place. The cut’s shape and manner is, after all, just as meaningful as the source materials it juxtaposes.

Capital always tries to convince us that any new (potentially profitable) technology is not just the natural telos of history, but also that it renders all other approaches obsolete. Large language models are only the latest example of this pattern. I hope I’ve demonstrated how LLMs are insufficient for my purposes as a computational poet, and are often, in fact, counterproductive to my goals. Thankfully, the LLM is only one approach to computational text collage, and the potential of its alternatives is far from being exhausted.

Works cited

Albers, Anni, et al. On Weaving. New expanded edition, Princeton University Press, 2017.
Bender, Emily M., et al. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, 2021, pp. 610–23. ACM Digital Library, https://doi.org/10.1145/3442188.3445922.
Carlini, Nicholas, et al. Extracting Training Data from Large Language Models. arXiv:2012.07805, arXiv, 15 June 2021. arXiv.org, https://doi.org/10.48550/arXiv.2012.07805.
Davis, Wes. “AI Companies Have All Kinds of Arguments Against Paying for Copyrighted Content.” The Verge, 4 Nov. 2023, https://www.theverge.com/2023/11/4/23946353/generative-ai-copyright-training-data-openai-microsoft-google-meta-stabilityai.
Drucker, Johanna. The Visible Word: Experimental Typography and Modern Art, 1909-1923. University of Chicago Press, 1994.
Golumbia, David. “ChatGPT Should Not Exist.” Medium, 14 Dec. 2022, https://davidgolumbia.medium.com/chatgpt-should-not-exist-aab0867abace.
Kearney, Douglas. Optic Subwoof. First Edition, Wave Books, 2022.
Mac Low, Jackson, and Anne Tardos. Thing of Beauty: New and Selected Works. University of California Press, 2007. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/nyulibrary-ebooks/detail.action?docID=2002084.
McQuillan, Dan. “Predicted benefits, proven harms: How AI’s algorithmic violence emerged from our own social matrix.” The Sociological Review Magazine, June 2023. thesociologicalreview.org, https://doi.org/10.51428/tsr.ekpj9730.
Pooley, Jeff. “Large Language Publishing.” Upstream, 2 Jan. 2024, https://doi.org/10.54900/zg929-e9595.
Samatar, Sofia, and Kate Zambreno. Tone. Columbia University Press, 2023.
Shumailov, Ilia, et al. The Curse of Recursion: Training on Generated Data Makes Models Forget. arXiv:2305.17493, arXiv, 31 May 2023. arXiv.org, https://doi.org/10.48550/arXiv.2305.17493.
Somepalli, Gowthami, et al. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models. arXiv:2212.03860, arXiv, 12 Dec. 2022. arXiv.org, https://doi.org/10.48550/arXiv.2212.03860.
Stern, Arden Eleanor. The Ransom Note Effect: Eclectic Typography in American Visual Culture. 2012. University of California, Irvine, Ph.D. ProQuest, https://www.proquest.com/docview/1024139545/abstract/780DB4D7B26648D5PQ/1.
Thibon, Joseph-Marie Perrin, Gustave. Simone Weil as We Knew Her. 2nd ed., Routledge, 2003, https://doi.org/10.4324/9780203666609.
Touvron, Hugo, et al. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971, arXiv, 27 Feb. 2023. arXiv.org, https://doi.org/10.48550/arXiv.2302.13971.