The Meandering through Textuality Challenge
[This is a talk I gave as part of panel entitled “Digging into Data” at the 2011 mla. I was joined by Glenn Roe (of artfl) and Ray Siemens of the University of Victoria. My talk was called, “The Meandering through Textuality Challenge: Reflections on the Humane Archive.”]
Digging into literature? Really? The word evokes images of pickaxes and front-end loaders, or else rakish Indiana-Jones types gamely hefting golden treasures from the otherwise unremarkable sands. The literary equivalent, one presumes, involves making similar excavations into the undifferentiated fields of textuality — the “words, words, words” of the burdened and overwhelmed reader — in the hope of finding . . .
Well, what exactly? I have many times suggested “pattern” as the treasure sought by humanistic inquiry: which is to say, an order, a regularity, a connection, a resonance. I continue to insist that this is, in the end, what humanists in general, and literary critics in particular, are always looking for, whether they’re new critics, new historicists, new atheists, new faculty, or New Englanders. Pattern is the linchpin of all humanistic argumentation from the Platonic dialogues to the Dialogic Imagination. Whether conceived as metaphysical reality or as desiring machine, pattern is the raw material of the hermeneutics of our discipline.
This would be a banal observation — as unenlightening and philosophically indefensible as saying “Everything is interpretation” or “Everything is text” — were it not for the fact that (like these worn phrases, once upon a time) it encourages us see a connection that might otherwise be obscured. For if humanistic inquiry is about pattern, then it isn’t completely crazy to suggest that computers might be useful tools for humanistic inquiry. Because long before computation is about YouTube or Twitter or Google, it is about pattern transduction.
The words we use to describe what we’re doing reinforce this connection. Woolf reconceives gender identity; Hurston reimagines the interplay of race and place; Moretti usefully reconfigures the English novel. We likewise ask our students to notice, to see, to find, and ultimately (we hope) to “re-“ as we do.
But not to dig, precisely. We dig Hamlet, naturally, and we’d of course like them to dig it as well, but we do not present the task of literary criticism or historiography as the process of finding some intact, but buried object beneath the surface. That’s because we have for a very long time now conceived of the patterns we’re looking for not as “out there,” but as “in here” — not as preexisting ontological formations, but as emergent textual epiphenomena. Someone might discover a new planet, but anyone who says they have “discovered” the origins of the Atlantic slave trade is (we hope) speaking metaphorically.
My purpose in saying all of this is not to criticize the eponymous title of our panel and the grant program that threatens to make it the natural term. As one of the people called in by the neh as an adviser during the creation of that program, I have less right than most to lodge a complaint. But I would like to suggest that the terminology we are increasingly adopting to describe text analysis in literary study — or, for that matter, any “big data” project in the humanities — is threatened with metaphor shear.
“Metaphor shear,” you’ll recall, is that incomparable term coined by Neal Stephenson to describe the experience of using Microsoft Word:
Anyone who uses a word processor for very long inevitably has the experience of putting hours of work into a long document and then losing it because the computer crashes or the power goes out. Until the moment that it disappears from the screen, the document seems every bit as solid and real as if it had been typed out in ink on paper. But in the next moment, without warning, it is completely and irretrievably gone, as if it had never existed. The user is left with a feeling of disorientation (to say nothing of annoyance) stemming from a kind of metaphor shear — you realize that you’ve been living and thinking inside of a metaphor that is essentially bogus. (Stephenson 63-64)
Metaphor shear, we should notice, is not the dispiriting revelation of the man behind the curtain. Neither is it that sudden release from a trance one experiences when the fire alarm goes off in a theater or we miss our stop because of a novel. Metaphor shear is a moment of exasperated surprise borne of an entirely incorrect notion about what is actually happening. Not “It’s only a movie!” but rather, “Wait. This isn’t a movie?”
And so it is with the idea of digging. When I started in text analysis — and in Internet time, we’re talking eons ago — being a text analysis practitioner was something like being a devoted student of Coptic paleography or of the Monophysite heresy. No one really cared (or, for that matter, understood what we were doing) but we loved what we were doing, and the small, but solidly international group of practitioners was enthusiastic and supportive. None of us could have imagined jobs in the area, articles in The New York Times about what we were doing, buzz at the mla about data mining, or “n-gram” as a trending topic on Twitter. None of us would have dared to dream of an Office of Digital Humanities at the neh giving actual money to people like us.
But now we’ve sort of arrived. That came about in part because of the general rise of the Internet, the creation of large-scale text archives like Google Books, and the rebranding of artificial intelligence as the provably useful and occasionally astonishing task of “data mining and machine learning.” Could the marvels of the latter have implications for humanistic study? People began reading our articles, inviting us to give talks, asking us to write books. Really, life could not be better for someone like me who would rather write software and stare at columns of words and numbers all day than do just about anything else. But people are asking us a question that we never really thought to ask ourselves: Where are the results?
It’s not that we haven’t ever used that term; we use it all the time. We also use terms like “hypothesis” and “control group” and “data point” and “method” and “success” and “fail.” Many of us are even guilty of describing ourselves as “digging.” But as with Microsoft Word, the metaphor is essentially bogus. It’s bogus even when we say it’s not. And that’s because we’re still are engaged in humanistic inquiry.
To work with text is not automatically to be so engaged. Scientists studying the human genome are working with lots of text, as are marketers looking for ways to sell motor oil. We might even say that they are looking for pattern, but, to radically abuse Bateson’s phrase, therein lies the difference that makes a difference. To search the human genome for a pattern that might indicate a genetic component to type 2 diabetes is to search for a specific thing that we would very much like to find. The marketers are looking for something that correlates with purchases of motor oil. When they find it (it being anything from potato chips to bath towels), they’ll tell us.
What are the humanistic diggers looking for?
One project is called, “Digging into Image Data to Answer Authorship-Related Questions.” What do “authorship-related questions” involve? You know: “finding salient characteristics of artists” (ibid.). Another, called “Digging into the Enlightenment,” proposes to discover “how the spread of ideas at the global scale relates to the dynamical processes that operate at the local scale.” Yet another, the laconically entitled, “Harvesting Speech Datasets for Linguistic Research on the Web” proposes to “evaluate theories about the form and meaning of prosody.” No one working on the human genome — or, for that matter, the motor oil-ome — would tolerate the vague and imprecise language here employed. How will they know that the characteristic is “salient?” And where is the null hypothesis in their search for the interactions between the global and the local? How many “theories about the meaning of prosody” do you suppose they’ll discover to be irrefutably false?
Perhaps I seem to ridicule these projects. I don’t mean to. If anything, I mean to suggest their clear alliance with the grandest traditions of humanistic inquiry. Every project gleefully proclaims itself to be “digging into data,” but on closer inspection, it becomes clear that they aren’t digging even in the metaphorical sense. They are, instead, doing something more akin to the meandering parole of the English or history classroom: asking questions, suggesting answers, reading, pondering. The astonishing thing isn’t, in the end, the ways in which high-performance computing and mega-scale datasets transform the humanities; rather, it’s how much of the hermeneutical basis of humanistic inquiry — the character of its discourse and the eternal tentativeness of its “results” — remains invariant. The revolution is not hermeneutical so much as methodological.
Which is not to say that it is any less of a revolution. In fact, it might be more revolutionary than anything that has happened in literary study in fifty years, precisely because the traditional humanities disciplines are so radically (if you’ll pardon me) undermethodologized. And that’s precisely why we need to get our metaphors right.
This is made harder than it should be by the fact that disciplines (and, just as often, companies) unconcerned and in some cases unfamiliar with the terms of humanistic discourse had the privilege of naming the animals. It’s hard to imagine even the most positivistic of the old nineteenth-century philologists referring to what they did as “mining,” and yet what could be more natural to an engineer? But even if we are unable to change the language of what we do, we can remind ourselves that just because the language is borrowed from another discourse does not mean that it now has the same meaning it once did. Indeed, the still-nascent discourse we call “Digital Humanities” might be most precisely defined as the attempt to figure out what that new meaning is.