http://www.tei-c.org/index.xml
http://professional.mit.edu/programs/short-programs/machine-learning-big…
Through the use of TEI encoding, I have been able to formulate a representation of a text document in digital form. The short snippet of the document that I attempted to encode was called “A Poem on the Bill Lately Passed for Regulating the Slave Trade”. One of the main components in TEI encoding is the TEI Header.
This header tag contains basic and necessary information about the digital document. Different tags under this header section are <author>, <title>, <distributer>, <address>, <email>, <date>, <publisher>, <pubPlace>, <language>, <notes>, and quite a few more tags. From these tags it is clear that with this information, the reader is able to know a lot of information about the document before actually getting into the main content of the document.
From going through the TEI Header section of Williams’ poem, I was able to obtain a lot of information. Below I list the information that I felt was most important to me from the header section.
Title: A Poem on the Bill Lately Passed for Regulating the Slave Trade
Author: Williams, Helen Maria, 1761-1827
Distributor: Dartmouth College
Publisher: Printed for T. Cadell, in the Strand
Publication Place: London
Publication Date: 1788
Language: English
In general terms, what this information tells us is that this author probably has English roots and is from a time period in which slavery and the rights of individuals was a big issue in society. When comparing the header section with the body of the document, there are clear differences. The body contains the actual text of the document and what is to be read. Information included in the body can be character details, storylines, or descriptions.
TEI headers allow for the creation of a well structured digital document. These headers are necessary because without them, the reader is left clueless about key information. What purpose does a reading or document serve if there is no author name, title, date or subject header provided? A long and well structured header is better than a short or no header at all because it ultimately makes for a better read and enhances the document that is to be encoded.
What made encoding this poem a bit difficult were the ways in which the text in the poem were structured and the wording the author used. Overall, I enjoy TEI encoding because I think it is fun creating documents using XML and experimenting with the many different tags.
What I think would be a cool project or focus of interest would be to potentially combine Machine Learning Algorithms with TEI encoding practices. This would have the goal of developing a program which would automatically turn any text based document into structured XML code and this would then follow TEI encoding guidelines. In essence, I do believe that the creation of a bot that produces TEI encoded files is in our near future and will definitely have a profound impact on digital humanities.