[This paper was presented at the Digging into Data Conference in June, 2011. It was a response to Data Mining with Criminal Intent — one of eight projects developed under the auspices of the Digging into Data Challenge (funded by neh, nsf, sshrc, and jisc). “Data Mining with Criminal Intent” worked with data from The Proceedings of the Old Bailey, combining Zotero (for data management) with TAPoR tools (like Voyeur). The project was the work of a group of scholars from Canada, the UK, and the US, including Dan Cohen (US Lead), Tim Hitchcock (UK Lead), Geoffrey Rockwell (Canadian Lead), Cyril Briquet, Frederick Gibbs, Jamie McLaughlin, Joerg Sander, Robert Shoemaker, John Simpson, Stéfan Sinclair, Sean Takats, and Bill Turkel.]
I think this work is completely brilliant. I am compelled to say that, of course, because I’m friends with many of the people on the team, and I don’t know that they’d keep working with me if I got up and said it was terrible.
I go way back with a few of them — back far enough to remember a time when an event like this would have seemed completely unimaginable. We did imagine conferences in which classically-trained humanists got together to geek out with maps, graphs, trees, and code. We also imagined that there would be no more than eight of us in attendance. An international grant competition funded by four major agencies, that would culminate in a conference attended by members of the media? I’ve been suppressing the urge to laugh since we began.
I recall a particular moment, in 2002 or so, when I gave a talk with Stéfan Sinclair, Geoff Rockwell, and a few others on a panel entitled “New Directions in Text Analysis.” You must understand, 2002 was eons ago. TAPoR had just been funded, Stéfan was demoing Voyeur’s great grandfather, Zotero did not exist (the world was condemned to EndNote), the Proceedings of the Old Bailey did not exist (though it was about to), and I was showing off visualizations of the Moby Shakespeare collection. Text analysis was a minor act in dh (most people were busy creating the data that we are just now beginning to dig). But the “age of tools” was starting to emerge, and revolution was in the air. So we all got up and made the following points (aimed squarely at the older guard):
- Text analysis is a hidebound backwater.
- It’s a backwater, because the people who do it are trying to escape the complexities of humanistic inquiry by trying to be scientists.
- This will never lead anywhere, because the questions that interest humanists are not tractable through purely scientific methods.
- The way out is to embrace a spirit of play, to recover the rhetorical posture of inventio, and to place subjective engagement at the center of digital humanities.
- You guys suck at science.
We were a little more polite than that, but you get the idea.
Our words were variously received. Some said that we had said what needed to be said. On the other hand, I remember one aged member of the text analysis community magisterially declaring that we could play with ourselves all we want. I will never forget one comment, though. A neither aged nor particularly cantankerous member of the text analysis community surveyed our graphs and visualizations, stood up, and said, “Isn’t this just art?”
He didn’t mean it as a compliment. What he meant was that data visualization informed by humanistic values is neither fish nor fowl. It neither provides the facts upon which science thrives, nor the themes and patterns that sustain humanistic inquiry. It is, at best, a kind of amusement. Beautiful, perhaps — maybe even profound in its own way. But not to be confused with serious academic scholarship.
The work presented here seems to me impeccable on at least two vectors. First, it’s built with well-architected, well-engineered code by people who know what they’re doing. And that’s a phenomenon that extends all the way from the unseen spheres of the backend datastore to the glorious glamour shots up front. I’m not sure the dh community knew how to build software when the revolution started. We do now. Or, at least, these guys do.
The second vector that impresses me is the “sciencey” part of it. Because part of what these guys are doing is, assuredly, based in the customs and methods of science. That was perhaps the point upon which we protested too much ten years ago. The truth is that digging into data is about numbers, statistics, curves, ratios, control groups, and experiments. This project’s indebtedness to science is everywhere apparent, and there’s no reason they should deny it, even if there perhaps was a reason to deny it ten years ago.
But what really impresses me about this project — what, for me, injects a sense of actual joy into it — is a line offered on the very first page of their whitepaper: “Given that the Old Bailey contains about 127 million words of text related to past crimes, we knew that there would be unusual and compelling stories to be told.” Perhaps it’s just the disarming folksiness of that phrase that I find so charming (it’s hard to imagine the scientists and engineers who built the Large Hadron Collider telling their funders that they expect “good stories” to emerge from their efforts). But actually, I think there’s something serious being put forth with that line — something that represents an important moment of maturity for digital humanities, and for the project of large-scale data analysis in general.
If the creators of this project are unabashed in their use of scientific tools and methods, they are likewise unapologetic in their description of why they are doing so. The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy. But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java. The authors express some worry about the fate of what they call “Ordinary Working Historians” or owhs (an acronym that, when pronounced, makes the sound of exasperation) — and they are right to have this concern. But I think we can feel completely confident that they will eventually reach that audience. Because if there’s one thing that’s better than theorizing about interface usability and intuitiveness and transparency, it’s sharing the concerns of your users. And they do.
Perhaps we should ask of this project (and of all the projects we’ve looked at thus far): Is it just art? I, for one, am ready to say “yes.” You come to it to be changed, but also to be reaffirmed. You struggle with its novel logics and modes. You sometimes marvel at it. Often, you don’t understand it at all. It rewards patience. It values adaptation. It speaks to the individual and to the group. It lies to you. It tells you the truth. It makes you look good.