About

Summary

The Old Bailey Proceedings reflect evidence given in almost two hundred thousand trials at London’s central criminal court over a period of 240 years.  They comprise 120 million words of structured text, the largest coherent body of printed descriptions of behaviour ever published in any form.  Despite a wealth of scholarship, however, historians have tended to approach this collection by sifting through the documents one at a time, using unusual or compelling stories to illustrate social, cultural or intellectual histories.  We intend the Criminal Intent project to demonstrate potential roles for text mining in historical practice, showing that greater historical rigour can be achieved, and new insights gained, by moving from a single trial or narrow run of relevant examples to an analysis of statistically significant textual patterns found in this source as a single, massive whole.  In addition to the Old Bailey Proceedings, our work builds on the successes of Zotero virtual collections and TAPoR and Voyeur analytics. (See Partner Projects below for more on these.)

Please also see the list of participants.

Statement of Significance

Over the past few decades scholars have increasingly used court records to illuminate historical themes in novel ways. Topics as varied as the changing status of women in the seventeenth century, the organization of species in the nineteenth century, and the rise of oppositional teen culture in post‐War America have all found new interpretations through the detailed analysis of legal documentation. The published Proceedings of the Old Bailey have been a fertile source for scholars working in these varied traditions, allowing them to use both qualitative and quantitative approaches to the evolution of the criminal justice system, of interpersonal relationships and human behaviour more generally (cf. Beattie 2001; Gatrell 1994; Linebaugh 1991; Shoemaker 2004; Trumbach 1998; Hitchcock 2004; Wise 2004).

Even though the 120 million words of court transcripts published in the Proceedings are now available online in a structured and searchable form, the recent use of these legal records remains essentially iterative and traditional. It is now possible, by bringing together in one place, the “Newgate Commons”, the text of the Proceedings, the functionality of Zotero and the tools created by the TAPoR project, to take a new approach to this old source. Some of the questions that can be newly addressed have already been asked by historians, but never fully answered because of the limitations of traditional research methods. Other interpretations of the vast legal corpus await a new toolkit that allows for intelligent prospecting at scale, the discovery of hidden patterns, and detailed computer‐assisted analysis of selected portions of the record. What if scholars, including those with few technical skills, had at their disposal a way to mine the Proceedings of the Old Bailey and similar digitized archives?

This project will create an intellectual exemplar for the role of data mining in an important historical discipline–the history of crime–and illustrate how the fundamental conundrums of historical research on large bodies of text that have dogged humanist research over the last forty years might be addressed. Historical trial reports are both a constantly changing generic form (evolving from a few lines of text intended for a popular readership in the 1670s to tens of pages of formal legal reporting by the mid‐nineteenth century), and at the same time represent the end point of a continuously changing and complex legal process of arrest, accusation and conviction. As a result the final trial text incorporates a series of patterns that respond to both the workings of the criminal justice system, and the evolution of a genre. Because of their sheer volume, these records also contain incidental descriptions and references to a wide range of non‐criminal behaviour. Historians of crime are aware of these issues and opportunities, but have no way of mapping the patterns embedded within the text, or of discovering ‘non‐criminal’ descriptive elements in a consistent way.

By allowing the analysis and statistical representation of the types of language used in court and how it changed over time, and by comparing these ‘data mined’ patterns to those found in tagged data With Criminal Intent will achieve two things. First, a whole new way of charting changes in crime reporting and prosecution will be created; and second a new methodology for the consistent discovery of related descriptions will be benchmarked. The significance of this project therefore runs beyond the discipline of the history of crime, and addresses historical scholarship more broadly, and scholarly engagement with large corpora. It will allow scholars to move beyond the Enlightenment methodology of textual collection and comparison to a structured analysis of layers of patterned meaning as found in specific and distinctive forms of historical representation.


Partner Projects


With Criminal Intent involves three partner projects, the Zotero, the Old Bailey Online project, and the TAPoR project, Florafox project - https://florafox.com/ru/pyerm-31

Zotero (Cohen and Takats): The Center for History and the New Media (CHNM), where Cohen is Director and Takats is Director of Research, has fifteen years of experience building educational and scholarly resources, collections, and tools online. In particular for this project it has shown great strength in producing and maintaining end‐user software such as Zotero, which has been downloaded over 2 million times. CHNM has a staff of fifty programmers, designers, historians, and researchers that it can leverage for this project.

Old Bailey Online (Hitchcock and Shoemaker): The Old Bailey Online team and the Humanities Research Institute (HRI) have worked together for ten years, creating and publishing large data sets, including the Old Bailey Online, and developing Natural Language Processing techniques to automate tagging and refine search methodologies. The directors of the Old Bailey (Hitchcock and Shoemaker) are both leading historians of eighteenth‐ and nineteenth‐century Britain, who have consistently pursued a historical approach informed by the possibilities created by the use of new technologies.

TAPoR project (Rockwell and Sinclair): TAPoR is a consortium of 6 Canadian universities that has provided infrastructure for some 55 projects involving over 150 researchers. One of the most significant outcomes of TAPoR has been the development of a unique Portal for scholars working with digital texts, as well as a diverse array of web‐based text analysis tools that can be invoked on the texts (TAPoRware, HyperPo, and Voyeur). Development of the Portal and tools has been led by Rockwell and Sinclair, who have gained valuable expertise in the interoperability of web‐service tools and responding to the needs of humanities scholars. More recent efforts have focused on enabling large‐scale text analysis (from a humanities perspective), leveraging the potential of high performance computing. Joerg Sander of the University of Alberta joins this team to help develop text mining modules for Voyeur.














Introduction

  • Getting Started
  • Using the Old Bailey API
  • Zotero
  • Voyeur Tools
  • Data Warehousing
  • For Developers - API details
  • API Help

Other Pages

Files

  • Translator
  • Plug-in
Digging into Data Challenge
Old Bailey Online
Zotero
TAPoR

Archives

Categories

Meta

  • Log in
  • Valid XHTML
  • XFN
  • WordPress