Back To Schedule
Monday, October 14 • 4:00pm - 4:15pm
Middle-Distant Reading: Big Data Meets Big Humanities Scholarship

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!


Computational Textual Analysis (CTA) is a controversial sub-field in the digital humanities. Previous work has shown that critics misunderstand the field’s objectives but also that practitioners overstate their results. Using the JSTOR/Folger Shakespeare dataset, I demonstrate that supercomputing on big humanities data can help both camps to understand one another.
I designed and optimized: an HPC job using Python and OpenMPI to transform the Shakespeare dataset from a citational network with 623,428 edges to a co-citational network with 29,256,101 entries (5,080 CPU hours); and an HTC job (16,000 CPU hours) that reduced this dataset for a Shakespeare recommendation system.
The resulting recommendation system powers a simple intertextual reading interface. Its recommendations qualitatively outperform standard CTA approaches. Computational analysis of humanities data can make better use of big data, especially the quantifiable interpretive activities of trained practitioners.


John Mulligan

Presenter, Rice University

Monday October 14, 2019 4:00pm - 4:15pm CDT
BRC 280

Attendees (3)