Monday, December 13, 2010

How Do You Transcribe?

As the planning and development of T-PEN begins to take practical shape and form, I have begun to think about my transcription methodology – the manner in which I have transcribed manuscripts in the past – and the implications of this for T-PEN’s User Interface and the issue of usability.

Transcription is a very practical activity, and it is usually something I have done rather than thought too deeply about. However, when I sat back and reflected on my methodology, I noticed that I tend to take advantage of the digital medium and work from digitized manuscripts, with copies of all the manuscripts containing the text open on my computer as I transcribe. Furthermore, I tend to work by sense-unit, transcribing a few short words from my base manuscript and then checking this reading against all the other manuscripts before I move on to the next unit of text. In effect, I transcribe only one manuscript and collate, sense-unit by sense-unit, as I transcribe.

I am certainly not claiming that this is the ideal method of transcription, but it struck me that it is quite different from the traditional approach of transcribing the entire text from one manuscript, then moving to the next manuscript and transcribing or collating the entire text once again. Of course this approach originally resulted from the fact that the requisite manuscripts were often housed in different libraries, making it impossible to compare all the manuscripts as one transcribed. This situation certainly changed with the advent of microfilm and other methods of reproduction, but it has been transformed radically with the arrival of digitization.

Nonetheless, the basic paradigm around which we have built the T-PEN prototype is still the old approach of transcribing the entire text one manuscript at a time. My reflections on my own practices have demonstrated that there are clearly other ways of transcribing a text; and T-PEN will be a far better tool if it is adaptable to the user’s preferred approach to transcription, whatever that may be.

As a result we are currently considering whether we should enable users to move between different manuscripts as they transcribe and/or facilitate other transcription techniques. I would therefore like to invite you to reflect on your own approach to transcription and on what kind of digital tool would best fit your preferred practices. How flexible or rigid would you like such a tool to be: should it provide a variety of options or be based on what we consider best practice? Even more fundamentally: what is your own transcription methodology – exactly how do you transcribe? All comments welcome below!

(Manuscript images from the e-codices collection and Codices Electronici Ecclesiae Coloniensis. Used in accordance with conditions of use.)

Friday, December 10, 2010

Painting a Beach: TEI P5, Paleography, Editing, and Digitising

One of the primary features of T-PEN will be its “auto-encoding”: transcribers will be able to insert commonly used xml markup (without understanding the technicalities!) into their transcriptions. At the CCL, we chose to use TEI P5 for our encoding, and our first major release of T-PEN will be oriented toward TEI P5 markup for the auto-encoding. The hope is that projects involving large quantities of transcribed text will no longer be crushed under the burden of encoding, for which the labour can certainly be extensive. Our premise is that it is the scholar -- who is often the transcriber -- who has the appropriate expertise to identify the components to be marked, although perhaps not the expertise in TEI. TEI P5 is rich, perhaps overly rich, in choices, and as TEI modules multiply, so do the choices. One can agonise for a long time about which tag to use, especially as many of the choices are subtle. TEI modules can sometimes offer different possibilities, as the documentation for TEI makes clear. Working within the “Manuscript Description” module (number 10) gives options that differ somewhat from the “Transcription of Primary Sources” module (number 11), which in turn differs from the “Critical Apparatus” module (number 12), and one might have been working on other major areas of TEI as well (“verse”, “certainty, uncertainty, and responsibility”, etc.) We anticipate that T-PEN users will likely be integrated persons, who are to some degree paleographers interested in editing, or historians working with images of primary sources that they hope to edit. How will we make intelligent decisions about which tags are most commonly used, and most useful? Although we have designed our development process to include extensive testing on a range of use cases, we welcome additional comments from anyone with experience in using TEI P5 in a project centred on manuscripts or handwritten documents.

I referred in the title to this post to “Painting a Beach”. That is because granularity is also an issue, and it is potentially one of the greatest shifts, perhaps, from the preparation of a print edition on the one hand, and, on the other, a digital presentation of material. In print editions, it is normal to have an introductory study that might report, say, that a manuscript has severe water damage on fol. 8-12, 24-32, and 118-140, and trimming interferes with reading the marginal glosses throughout. TEI P5 allows one to record on a word-by-word (or even letter-by-letter) basis what is obscure, why it is obscure, how obscure it is... So, instead of painting a beach with a few strokes of sand-coloured paint, one can paint every grain of sand. Accurate, but efficient? Do paleographers and editors opt for significantly different levels of detail in their markup practices? If the point of markup is to provide a foundation for display programming, what sort of features are likely to be displayed? Or to what extent is markup just a form of recording information, with no use of digital tools foreseen? These are some of the larger theoretical questions upon which we meditate as we develop the auto-encoding feature.