CBR: The Dangerous Art of Text Mining Chapter 2: From Fantasy to Engagement: Channeling the Potential of ‘Hybrid’ Teams.

Jo Guldi. The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge: Cambridge University Press, 2023.

Chapter Reviewed: Chapter 2: From Fantasy to Engagement: Channeling the Potential of ‘Hybrid’ Teams.

Review by: Elizabeth Varkey.

Introduction

The second chapter of Jo Guldi’s work titled “From Fantasy to Engagement” begins by returning to the Chinese parable of the monkey king, Sun Wukong that appeared in the book’s Introduction, reinstating the parallel with the powerful data scientist, a “daring if indiscriminate wielder of technology” (Guldi, 58). Throughout the text Guldi uses parables/ stories/ anecdotes to put forth her point. In a world flooded with data – dirty and clean – stories seem to be an essential tool in the humanist’s armory. I am reminded of the famous TED talk by the Nigerian writer Chimamanda Ngozi Adichie titled “The Danger of a Single Story”. While Adichie insists on the need for several stories to arrive at a better understanding of reality, the researcher must insist on as representative an archive as possible, before commencing his/ her analysis. The selection of the archive, as Guldi points out at the end of chapter 1, would benefit from the involvement of a humanities researcher in the early phases of the project. With a strong background in the “interpretative arts”, such a person would be well equipped to handle the “dangers of occlusion and dirty data” (Guldi, 55).

Five Fundamental Precepts 

In this chapter Guldi lays down a set of five precepts that she then proceeds to unpack one by one. These precepts are designed to help those from “the world without history” (Guldi, xvi) – the term Guldi employs throughout the book, to refer to data scientists/analysts who utilize data obtained through text mining of historical archives, without fully understanding the context of their production or paying heed to the occlusions of the archive. She underlines the importance of adhering to these precepts in order to make sense of “the masses of text about human experience present and past” (Guldi, 58) 

Precept 1 #: Smart Data Is Data Whose Origin Is Specified, Where the Analysts Are Aware of Shortcomings and Delimitations

Guldi warns us of the dangers of treating any database as an “innocent record of history, to be mined and interpreted, free of context, and used to advise judges, prisons, university admissions personnel, or public-school administrations”. She insists that “before we use data to describe or advise, we must know what kind of data we’re dealing with” (Guldi, 61). 

Time and again, Guldi insists on the need for an interdisciplinary approach to data that is capable of yielding new, “hybrid” forms of knowledge. She provides convincing instances of data interpretation gone wrong, by drawing upon her own experiences – from being a lecturer astounded by her group of students using text mining methods to track the references to “ignorant women” in Parliamentary archives to arrive at the absurd conclusion that women had grown more ignorant over the course of the century; to being a researcher “drenched in the fantasy of unoccluded archives” (Guldi, 62). While her humanities training enabled her to re-orient her students’ analysis, Brian Croxall, a digital librarian, helped her recognize the limitations of her research ideas and narrow down her dataset. 

Precept #2: The Building Blocks of Historical Analysis Reward Inquiry

The author provides a glossary of important terms that functions as a great primer for data scientists handling historical data. The glossary lists “six fundamental building blocks of historical knowledge” (Guldi, 65) namely archive, change over time, event, period, influence and memory. While Guldi provides a quick but nuanced explanation of each concept in this chapter, she promises to delve much deeper in the succeeding chapters. Guldi believes that a basic understanding of these building blocks would help data scientists comprehend the major concerns of historians and provide them with relevant and meaningful data. As a literature scholar, this glossary got me thinking about key terms that I would draw up if I ever wished to explain literary studies to a data analyst. Perhaps terms like genre, plot, setting, characters, figurative language, style would appear on the list, though this list may require much more deliberation and refinement.

Further, Guldi chalks down a list of possible uses of work done by data scientists on historical data. For instance, algorithmic analysis of the past could be used to create a “news tracker that informs students about the most unusual developments in law, rather than the most canonical ones or the ones linked to influential figures” (Guldi, 72). However, she is quick to point out that biases may still persist in “data-backed views of history” (ibid).

Precept #3: Critical Thinking Is Most Powerful When Applied to Each Part of the Research Process

Guldi insists that “raw word counts” and other quantitative methods serve merely as groundwork for investigating the “relationship between any given textual corpus and temporal experience” (Guldi, 75) and must necessarily be accompanied by an “equally deep engagement with historical context” (Guldi, 73). After briefly describing some common analysis methods, she explains how each method “gathers its respective biases” (Guldi, 76). She concludes that interpretation of data is as important as selecting appropriate tools and models in keeping with one’s research questions. (ibid)

While “contemplating the fit between data and analysis”, she advocates for a process of “critical search” that “is more than just a button we keep pushing until we get results; rather, it’s a flexible and extensible set of guidelines for contemplating and bridging the expanse between humanistic thinking and algorithms” (Guldi, 79). She explains that this approach calls for “critical thinking at every stage of analysis: in the choice of data, the cleaning of data, the choice of data model, the variety of algorithms, and the reading and interpretation of results” (ibid). Drawing upon personal examples of overcoming “dead-ends” in her research journey, Guldi surmises that in order to arrive at useful findings, researchers must work in hybrid teams that facilitate critical search.

Precept #4: Discovery Happens Where Old Fields Meet

Citing examples of successful digital humanities research today, such as “the Lab for Social Minds, run by Simon DeDeo at Carnegie Mellon University” (Guldi, 87), Guldi underscores the power of interdisciplinary teams. However, she prefers to use the term “hybrid”, with its etymological roots in agricultural practices, over the term “interdisciplinary” because she believes that the latter is too generalized. Contrasting between the two terms, she states: “I can be interdisciplinary in solitude, reading books on art history and writing the ideas into my paper on history. But hybrid teams require ongoing support and thinking between people trained in, and who identify as members of far-flung disciplines” (Guldi, 92) This got me thinking about all the times I have been ‘interdisciplinary’ in my research and pedagogy, but have failed to be truly ‘hybrid’, neglecting potential collaborations with my colleagues from other departments. With their technical know-how they could have saved me hours of labour, tinkering with text analysis tools, like Voyant and AntConc, that I am not so familiar with. Rather than working in silos, I realize now that my efforts would have been more fruitful had I reached out to colleagues from the neighbouring Data Science Department, who would likely be adept at finding their way around such tools. 

Guldi insists that “hybrid knowledge” is not something novel, but harks back to lives and works of eighteenth and nineteenth century mathematicians and poets such as Ada Lovelace and Augustus De Morgan (Guldi, 93). She concludes this section with a list of shibboleths or guiding principles “that divide communities from talking about past experience” and goes on to explain this in further detail in her last precept.

Precept #5: Shibboleths Mark Out Contested Areas of Practice where Interdisciplinary Work Must Proceed with Care

Guldi’s table of shibboleths comprises data science jargon such as “prediction, laws of historical change, future performance, new data, test and training data and sampling data” that immediately distance them from “historical practice”. (Guldi, 95) She explains that since historians and data scientists think with different halves of the brain, a word like “prediction” would send them off in different directions, with the historian burdened by ethical dilemmas and the data scientist preoccupied with profit dimensions. 

Rather than presenting a romanticized picture of hybrid teams, working together in perfect complementarity, the author briefly delves into possible conflicts that can emerge. She points out that at times what may appear “obvious” to the data scientist might look “naïve” to the historian, leading to feelings of frustration and alienation amongst the team members. Hence identifying and shunning these shibboleths is a prerequisite to working successfully in hybrid teams. 

Conclusion

Guldi achieves the impossible feat of ensuring that the historians and humanists who read her work aren’t intimidated by the technological bits while the data scientists and analysts don’t feel bombarded with humanistic methodologies. Perhaps what helps her achieve this, is her own interdisciplinary training – a coder not alien to or divorced from the world of history. The five precepts outlined in this chapter collectively challenge traditional research paradigms and signal a paradigm shift in scholarly practice.

Drawing upon the analogy of mining – in order to turn data into meaningful nuggets of information – Guldi’s work convinces me that the need of the hour is a veritable tribe of ‘alchemists’ who can skillfully combine the ‘science’ of data extraction with the ‘art’ of critical interpretation. Reading her work led me to revisit an article written by Dr John Kennedy, Dean, School of Arts and Humanities at Christ University. In what sounds like a plea for better sense to prevail, the article is titled “Don’t Drop Humanities When We Need Them Most”. The author remarks ironically that while humanities departments – such as the English literature degree at Canterbury Christ Church University – are closing, the critical skills that humanities training imparts have never been more urgently required. He reflects wryly: “In an era of fake news, polarised debates and polycrisis, the ability to think critically, communicate effectively and understand diverse perspectives is indispensable”. Guldi seems to be going a step ahead when she suggests that rather than dropping the humanist for the data scientist, both must work together to ensure that the limitations and occlusions of datasets are understood and interpreted responsibly. Such collaborations that cut across disciplines— especially through hybrid teams— will help effectively navigate the ethical, interpretive, and methodological complexities of digital historical analysis.

References

Adichie, Chimamanda Ngozi. “The Danger of a Single Story.” TED Talk, YouTube Video, 7 Oct. 2009, www.youtube.com/watch?v=D9Ihs241zeg.

Guldi, Jo. The Dangerous Art of Text Mining. Cambridge University Press, 2023.Kennedy, John J. “Don’t Drop Humanities When We Need Them Most.” The New Indian Express, 7 Jan. 2025, www.newindianexpress.com/opinions/2025/Jan/07/dont-drop-humanities-when-we-need-them-most. Accessed 8 Jan. 2025.

Leave a Reply

Your email address will not be published. Required fields are marked *