Jo Guldi. The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge: Cambridge University Press, 2023.
Chapter Reviewed: Chapter 7: Of Memory
Review by: Rimi Nandy, PhD.
In Chapter 7, titled Of Memory, Jo Guldi discusses the relationship between data and the broader field of Memory Studies. Through various case studies, the chapter focuses on the way data is presented and represented in the construction of collective memory. She also traces the way memory is reconstructed in every age, while gazing back into the past and recreating the future perception of the past. Guldi uses several text analysis approaches using Python library SpaCy1. The chapter discusses the various approaches combining textual analysis to understand the history of, while at the same time pondering on the improbability of identifying a single strand of history. The chapter introduces varied ways of engaging with perception of time through memory and its rootedness within the space of history.
The chapter is subdivided into six sections, which depict different approaches and aspects of understanding and history through various strands of text analysis. The focal point of the chapter is the analysis of parliamentary speeches down the ages to understand the “attitude of society to their past”. Guldi’s primary question is the feasibility of considering textual analysis as “predictive science” based on models of historian’s knowledge. Through her discussions, the author emphasizes that narratives are structured to represent a prevalent idea which suits the one in power. While textual analysis helps in understanding the various topics which are discussed in parliamentary speeches, there are larger number of topics which do not find their way into the discussions. The relevance and social significance of the Irish Famine for example cannot be gauged through the fact that the number of times the Irish Famine is referred to in parliamentary speeches are much lesser the reference to the Crimean Wars.
Among the several examples given, a primary one is of the European colonizers creating a narrative that portrays the Mughal Empire in 19th century India as tyrannical, thereby justifying its acts of colonization. Guldi refers to Hobsbawn’s concept of “invented traditions” which is intrinsically linked to the act of repetition of words and phrases to construct a dominant narrative. In the similar vein, the presentation of a nation as predominantly connected to the white identity cannot be concluded based on the repetitiveness. The author cautions about the limitation of textual analysis in coming to a definite conclusion regarding a single and true history. Whether the repetition of words and phrases depict ‘consent’ or ‘dissent’ cannot be identified through the patterns of repetitions and singularity (Pg 214).
Guldi gives a detailed study of PoS (Parts of Speech patterns study) using SpaCy, a popular Python library widely used in natural language processing. She also employs specific text mining techniques, such as identifying and analyzing four-digit numbers within the texts, which she interprets as references to historical years. This approach enables her to map the temporal contours of parliamentary discourse, revealing which periods are foregrounded and which are neglected. Her methodology includes parsing these numbers to identify patterns of historical emphasis and silence, thereby illustrating how collective memory is shaped through selective temporal references. For instance, her findings suggest that the frequency of mentions of certain years can reflect broader political narratives or the erasure of specific historical events. One particularly illustrative example is the debate in Parliament over Samuel Pepys’ diary in 1832, where members invoked his writings to support conflicting political agendas. This instance underlines the complexities of historical reference—where appeals to memory could be challenged, disputed, or misquoted. Beyond numeric dates, Guldi also explores references to named events like the “French Revolution” or “Boer War,” using named entity recognition (NER) to identify and categorize these occurrences. She acknowledges, however, the limitations of this method—ambiguities in naming, overlapping terminologies, and the labor-intensive task of cleaning such data complicate any effort to arrive at definitive interpretations. Accompanied by visual representations of the data, her analysis opens up avenues for exploring the fluidity of historical time, perceived simultaneously through retrospective and prospective gazes. She concludes the chapter with a reference to Paul Gauguin’s painting titled “Where Do We Come From? What Are We? Where Are We Going?”. The manner in which temporality collapses on the canvas, intermixing colonialism, personal tragedy, French identity in contrast to the Tahitian representation, and the weaving together of the biblical story of the Garden of Eden and the stages of life experiences, deftly puts the core argument of the structure. Like the painting depicting several lives and perception of the same, text analysis of historical documents can never provide a true picture of the past, as it is a conglomeration of multiple pasts and its perception through time and in time. The chapter brilliantly connects to the central focus of the book The Dangerous Art of Text Analysis.
The author refers to SpaCy as a software, which could be seen a much broader term. The specific function SpaCy centres around its potential as a Python library which helps in identifying text analysis patterns sunch as PoS (parts of speech identification pattern). ↩︎