Last week we talked about understanding the field of digital humanities, and part of the struggle within the field is to have the recognition for the work that is being done within the field. Christof Schoch discusses this in the Journal of Digital Humanities, and questions why this is so. He cites researcher and practioner Joanna Drucker, who states that the term “data” is inadequate when working in the humanities, so much so that she coins the word “capta” to better imply the active work of “capturing” information that practitioners do. This resonates with Dr. Otis’s comment that some colleagues don’t find her research to be real “work.” I think what Drucker says is true in that data and its “work” tend to connote independent observers just “observing,” versus a more active response in actually synthesizing, analyzing, and creating conclusions, which is the work of digital humanities.
This data capture work of digital humanities is significant, and takes time and tinkering. Schoch discusses these types of data, points of information that have been digitized and captured from many different sources, which needs to be organized constructively to make sense of and to draw conclusions from. I understood this more fully when we were given our own datasets of biographical data of people from 16th century Wales. This dataset was definitely messy; you could see different conventions of recording the same information throughout the columns. For example, a given record of birth/death could be written 1533-1600, b. 1533, d. 1600, b. 1533- d. 1600. So you can quickly see, that arranging this messy data, particularly when there over 13,000 records, would take a long time to manipulate by hand. With the use of technology, in this case the application OpenRefine, we can manipulate large amounts of data so then it can be analyzed. For our exercise we had to clean the messy data, and find people who were born the same year as Queen Elizabeth and how many more were born before the start of the first plague. I found that easier than trying to parse out the difficult Welsh names with varying ways of entry, prefixes, titles, and what not. In the end, another challenge of working with big data, is the uncertainty whether you’ve in fact done it correctly, since you can’t easily “check your work” on 13,000 entries.
Beyond bias against the work of digital humanities practitioners, Blaney and Siefring (in Digital Humanities Quarterly) have noted bias against utilizing digitized sources in citations for humanities research. They question the reasoning behind this, as people cling to much older print sources despite the authors’ opinion that newer digital sources that are more robust. To illustrate this example they criticize the dependence on the Oxford English Dictionary versus the more verbose Wikipedia. To me it underscored the the differences between print and digital media and why those concerned about instability of language rely on print sources. The impetus behind the brevity of the OED is seated in the need to create a hand held dictionary of English words. Wikipedia can devote a whole website page to the definition of the word hubris while the OED cannot afford to do that with every single word without losing practicality and money. OED is solidified, stable; Wikipedia may expound on meaning but it is more slippery, as anyone may edit Wikipedia at any time.
Rebecca
September 7, 2020 — 12:06 am
Nicole, Yes, the variety of answers we are all arriving at for the data makes me wonder what research protocols a historian would need to take to verify their data analysis. Regarding the OpenRefine data, at least the Welsh names did not seem to affect the results of the most popular given name, but there was still messiness and easy to miss various kinds of messiness that could significantly alter the results given how large the data set was.
Jayme Kurland
September 7, 2020 — 12:48 am
Great points Nicole! I found Blaney and Siefring’s points about Wikipedia so interesting, and timely. At one point, Wikipedia was an unacceptable source due to the fact that anyone can edit a page without certain review. That said, Wiki is most people’s first source when it comes to just about anything. I often rely on the citation section of Wiki to get started on a new project. And many libraries and universities hold “wiki-a-thons” to edit pages as a group. I wonder how wiki could become more reliable and accepted as a source, perhaps with author citations, since the data is available when one looks at the edit history. As public historians, Wiki is about as public, popular, and free a source as we could use!
Cassandra
September 8, 2020 — 4:47 pm
Nicole, thank you for writing this post. After working with Open Refine this past week and working with “messy” data, I wonder if a semester course in Microsoft Excel should be a required course for all history majors. The profession is undergoing change, particularly with the “digital turn,” and future scholars will need to be able to save and preserve data, “messy data.”
As far as those historians who are reluctant to cite digital sources in their papers, I have mixed response. I wrote a paper a few years ago and our library didn’t have the published journals but one of the databases to which we subscribe did. As I cited these, I contemplated just citing the traditional bibliographic information. I decided to cite these as digital born because it would not have been completely honest to state otherwise. As we continue to rely more on digital resources I expect better citation methods will be created and the companies providing access to these records will become more in-tune with their patron needs.
On another note, I have seen Wikipedia cited in peer reviewed articles and I suspect that this will increase.