Text Analysis: Underwood, Brett, and Posner

Brett and Underwood make many of the same distinctions in their discussions of text analysis or text mining. Both emphasize that the tools of text mining are usually applied to very large collections of works and data. Both also stress that there is inevitably going to be some level of human intervention. In other words, both acknowledge that these tools are fallible in some areas and that the human eye is necessary to identify obvious errors. Finally, both also provide examples of visualizations in which models can be displayed.

The main idea I got from these two views of quantitative text analysis was that it helps reveal new kinds of connections between texts that would not otherwise be possible given the limitations of the human memory. In other words, text analysis offers a “distant reading” of texts that allows to better understand collections of works as they relate to each other, and that can even strength our “close reading” of texts by revealing clusters of words or topics that we would not have otherwise known were present.

One question I have about text analysis is the difference between text mining, as described by Underwood, and topic modeling, as described by Brett. My current perception is that topic modeling seems to be a subset of text mining, which seems to be roughly equivalent to quantitative textual analysis. I wonder if this is an accurate assessment of the relationship between the two methods, and I also wonder what the distinctions are between the various tools mentioned in both articles (Python, R, MALLET, etc.), and why it is important that there are so many different tools for text analysis.

I think one reason this may be important is that, as Underwood notes, text mining is more “exploratory” than “probative,” and thus the use and application of multiple tools can potentially lead to more new findings. A theme or main feature of text mining seems to be that you cannot be sure what you are going to get. In some sense then, I feel text analysis is a good area to meet Miriam Posner’s call for us to challenge our underlying assumptions of digital humanities work and the methodologies we apply.

Underwood seems especially aware of this in noting that the issue with MONK and Voyant was that they “didn’t permit me to make my own methodological innovations.” However, Underwood goes on to say that text mining can present “a kind of evidence we aren’t accustomed to yet. But lists of overrepresented words can be a fruitful source of critical leads to pursue in more traditional ways.” The idea that it is a fault of these tools to offer us something we are unaccustomed seems to contradict Underwood’s earlier sentiment of wanting to develop new methodologies and Posner’s overall agenda. I wonder then, if text analysis can in fact be leveraged in a way Posner would see fit, or if these tools will be forced back into framework of our “traditional ways.”