Beware the Coffee

March 29th, 2011

As several posts have nicely illustrated, creative visualizations of the entire Old Bailey archive can provide new and intriguing glimpses into history. It’s important to keep in mind that one can play around with smaller subsets of the data as well, just to get a quick sense of what going on in particular contexts. For example…

Using the prototype tool/process mentioned in the previous entry (Old Bailey API -> Zotero -> Voyeur), I was able to extract from the OB records that contain the word poison (about 400).

I could then send those records (or a subset of them) to Voyeur to get a better sense of what’s going on with these particular records in the context of poison. 

Filtering out the usual stop words and other words that happen to be common to the trial records, like “prisoner”, I can see that “drank” is one of the more common words to appear in my search results. Zeroing in on the context of “drank”, I can see that one of words that most commonly appears nearby is “coffee”.

Of course it’s no surprise that a strong, bitter drink would be a vehicle for administering poison. Or perhaps that cheap coffee tasted so bad that someone, having fallen ill for any reason after drinking coffee, could feel as if they had been poisoned by it. Perhaps more significantly, I also learned that “ate” or “eaten” appears hardly at all near the word “poison”.

Are these revolutionary historical observations? No. But that’s not my goal here. Rather, I’m just trying to get an informal sense of the context in which poison is being discussed. Had I read through these trial records individually, I might have noticed the frequent mentions of coffee at some point. But then I would need to go back through all other records to see what I’d missed. My sample here is only a few hundred records; the task of rereading the archive doesn’t really scale to several thousand or more records, at least for practical purposes.

Although obviously not groundbreaking, the “coffee” discovery does say something interesting and prompts other questions. Maybe I should search for coffee in trial record from other archives. Are there other documented connections between coffee and poisoning? I certainly have more questions than answers. More importantly, I was able to garner historical evidence (granted, from just one source in this case) about the nature of poisoning in London in just a few minutes of playing around. One could minimize this discovery by claiming its obviousness. But such an attitude discards the distinction between a priori assumptions and evidence-based historical interpretation.

Obviously, this post really isn’t about coffe. It tries, rather, to show that the learning curve for some quick inquiries into the archive is neither long nor steep. While nothing I’ve done here could be considered technically sophisticated, it is revealing. And it shows that it’s never been easier to get a rough sense of what an archive or set of texts can tell us, especially when focusing on one particular context at a time and thus sidestepping the daunting complexities of dealing with much more data all at once.

A Related Project and Some Caveats

November 30th, 2010

Those following the Criminal Intent project may wish to experiment themselves with a related project that does text mining on Victorian books. In particular, the team at Mason plans to compare some of the topic graphs from the nineteenth century with subject matter in the Old Bailey. And while you’re exploring some of those graphs, be sure to read the caveats, which likely apply here as well.

The Old Bailey API – progress report.

October 8th, 2010

We now have a demonstrator site for the  Old Bailey API that will form the basis for the ‘Newgate Commons’.  Thanks to the hard work of Jamie McLaughling at the HRI in Sheffield, the demonstrator is now fully functioning, and we hope to make it available in a more robust version for public use within the couple of months.

As it stands, the demonstrator  allows queries on both keywords and phrases, and on structured and tagged data to be generated as either a search URL, or else as a Zip file of the relevant trial texts.  The basic interface also allows the user to build a complex query and specify the output format.

Old Bailey API, Demonstrator - search screen and sample results.

The demonstrator also allows the search criteria to be manipulated (‘Drilled’ and ‘Undrilled’), and for the results to be further broken down by specific criteria (‘Broken Down by’).

Old Bailey API, Demonstrator, with 'Broken Down by' implemented.

The demonstrator creates a much improved server-side search and retrieval function that generates a frequency  table describing how many of its hits contain specific  ‘terms’ (i.e. tagged data from the Proceedings, such as verdict).  It is fast and flexible, and will form the basis for swapping either full files, or persistent address information by using a Query URL, with both Zotero and TAPoR tools.

The Old Bailey in Numbers

June 18th, 2010

Datamining is about discovering patterns in text, but the Old Bailey Proceedings already incorporates tagged data reflecting what contemporaries thought they were doing.  The  nature of the crime, the name, gender and age of the defendant, the verdict and  punishment were described in words their authors thought beyond  mis-interpretation.  To use datamining to find new patterns, it would  help if we could subtract the patterns that we already know about.   The  huge rise in theft prosecutions in the first half of the nineteenth  century, the changing proportion of men and women prosecuted, the  evolving nature of the crime itself; each needs to be interrogated to  illustrate where changes in language can be explained as the result of  changing judicial practise, and where these changes suggest a new and different  explanation.

Voyeur and Old Bailey

April 15th, 2010
Screen Shot

Screen shot of Voyeur with Old Bailey data

With Criminal Intent has connected Voyeur with the Old Bailey Online project in a preliminary prototype. Click here to try Voyeur with a subset of the full Old Bailey Corpus.