Man looking backwards through binoculars

Distant reading is a type of literary analysis borne in the age of digital humanities. Rather than “close reading,” which has been historically done when examining text which means examining an individual text closely to dissect main themes. Distant reading, pioneered by scholar and professor Franco Moretti, flips the equation by taking large amounts of information from multiple texts and feeds them into a computer program to analyze trends across the larger body of works. It is much like the way meta-analysis operates in the sciences.

Elizabeth Callaway and others use this method to try to home in on the definition of “Digital Humanities” as a discipline – something that is oft discussed but not quite concretely understood by many. In this “distant reading,” the authors were looking at how other authors talk about digital humanities. After looking at the corpus, or body, of texts, they examined 55 salient topics and examined oft co-located words. It seems that one of their major takeaways from their work is that there are more male authors on the subject, and that women’s voices were “underrepresented.” This feels a little unsatisfying as a major conclusion to me, when the apparent goal is to better understand the subject and how people perceive the subject. The conclusions seem peripheral for their study’s goals.

It does seem that this is often the ambition of distant reading, is to reveal gender imbalances and lack of diversity inclusion. However, this was not the goal of Shawn Martin’s research in topic modeling of scientific journals. Martin explored the evolution of journal’s through their publications which showed their evolution from more news-source based articles to sources of original research within very specific disciplines. So, distant reading has many uses and ultimately does help historians understand how people talked about specific topics over time or during a specific unit of time as the case may be.

For our technical activity, I played around with a application called AntConc, which is a digital concordance application that can be applied to any text file. As a freeware corpus analysis tool, any input text files can be analyzed for word frequency and collocated words. For our sample dataset, we looked at movie reviews and compared incidences of different words and the words that most frequently surrounded it. When I examined women versus men, they were both used with about the same frequency. And both were collocated most often with small non-descriptive words most often. However for men, when it came to descriptive adjectives, the top frequency words were young, old, and black. For women, the top adjectives were young, pretty, and beautiful. The women’s results doesn’t seem surprising, but the men’s results seemed curious to me. Nonetheless, collocation, which tells you which words are next to a given word, still omits the context. I assume we are still supposed to think about the way we talk about things is what is being examined here; even still I’m reluctant to draw firm conclusions. When I examined my own work for term papers I had written in previous graduate art history classes, the results weren’t surprising. For a paper I wrote about Sofonisba Anguissola, the top nouns collocated with her name were work, Vasari, and portrait. For another paper I wrote about Giotto, the top nouns were chapel, Vasari, and frescoes. Am I seeing a trend? Why yes. The work of Renaissance artists affiliated with Vasari. Which makes sense, since that is pretty much my specialty!


Elizabeth Callaway, et al, “The Push and Pull of Digital Humanities: Topic Modeling the ‘What is digital humanities?’ Genre,” Digital Humanities Quarterly 14, no 1 (2020).

Shawn Martin, “Topic Modeling and Textual Analysis of American Scientific Journals, 1818-1922,” Current Research in Digital History 2 (2019) OR Peter Carr Jones, “Macroanalysis of the Indian Claims Commission,” Current Research in Digital History 1 (2018).