Tuesday, October 26, 2010

quick hello from the Co-PI (Abigail Firey, Univ. of Kentucky)

I come to T-PEN as the director of a project that has as one of its central activities the transcription of unedited, usually unprinted manuscripts: the Carolingian Canon Law project. It is a highly collaborative project, designed to receive contributions from scholars present and future, known and unknown, in order to build a “conceptual corpus” of the legal texts known to Carolingian jurists. Because of its open nature and the need for transcriptions prepared to the highest editorial standards, the project will benefit enormously from a tool that allows easy transcription and simultaneous preparation of an encoded file (we are using TEI-P5) without any knowledge of markup required on the part of the contributor, and that also allows ready verification and correction of transcriptions. Transcribing is – as experienced scholars know!—a demanding task, and there are dangers lurking. We had a strange experience on the CCL when we were reviewing a transcription that had been prepared from an existing electronic version of a text and then altered to match the manuscript readings: until we started line-by-line proofreading, we did not notice that very familiar words in the first rubric were, in fact, missing in the manuscript! If the transcriber had been using T-PEN to create a new transcription, there likely would not have been the error; even if the presumed reading had crept in, it would have been easy to check and correct the transcription. Our other challenge has been to keep up with encoding transcriptions (we haven’t, is the short answer). We cannot wait to implement the “auto-encoding” function of T-PEN, so our research assistants can then dig into serious scholarship, instead of encoding all the time! (Some readers may remember Stan Rogers’ “White Collar Holler” (“Can you code it? Program it right!”)


I'm Jon Deering, the Senior Developer working on TPEN.
As Jim said, I'm getting started on TPEN today. The first thing I needed to do today was build a bit of a front page for the tool. The transcription prototype was built for ENAP, which was only a single manuscript. While it was built to allow other manuscripts to be available, including manuscripts from other repositories, we didn't have a good way to browse the manuscripts available in the tool. The front page I built lists the available manuscripts, along with the name of the hosting repository, as a link to the first page of that MS available for transcribing. During our Monday meeting, I'm going to see if we can get the go ahead for at least one full repository made available in the prototype, maybe more. By then I think we will have our domain, and anyone will be able to go in and test transcribing with those few hundred manuscripts.
I'm also working on the customizable hotkeys, which allow a transcriber to set up a number of non standard keys they use often in transcribing (ÞÐÆ in middle English for example). Those characters will be clickable on a toolbar at the bottom, and also have control combinations assigned to them, starting with control 1-9. We had a static set for middle English when a paleography class used our prototype last fall, and the feedback we received was very positive.
As far as introducing myself, there isn't much to say. I keep a rather low profile online, not using facebook and only using twitter for work related communication. While I don't have a humanities background, I have been caught puzzling over the peculiarities of manuscripts now and then, just because they end up in front of me while I work on projects. This is particularly true of the image processing project we did over the summer, which I'll post about in the future when I can include links back to the results. I really enjoy the image processing side of the work we do, and the way setting the computer on images of a manuscript can really aid in not only the editing, but the exploration of the document.. I would say that is my favorite aspect of working on both ENAP and TPEN.
I attended THATCamp London in July, an unconference for digital humanities that occurred right before DH2010. I enjoyed the opportunity to share my work, and find that others in the field had solved some of the problems I was dreading. THATCamp provides a less formal, very open and collaborative forum where such things can happen. In particular, while showing our transcription prototype to a few people, one showed me an in house tool used for transcription at the library he worked for. The method they used for representing overlapping tags with colored underlining is an elegant solution, and something I expect to use in TPEN.

Welcome Back!

Jon Deering, T-PEN's senior Developer, begins work on the project today. Jon was the sole developer for the Electronic Norman Anonymous Project (ENAP), a project that finished in July 2010. It is great to have him back. His enormous energy and creative approach to software engineering has garnered him a great deal of respect and praise here at Saint Louis University and amongst our collaborating institutional partners. We know we are going to see great things from him as we all get hip deep into T-PEN development.

Jon will introduce himself, along with the other project staff, in the near future.

Wednesday, October 20, 2010

T-PEN Strategy Meeting

Yesterday the core design team for T-PEN met to plan out the work flow for the next 18 months. We worked through a hefty agenda that took almost six hours to complete, but we covered everything! That included a terrific "on the fly" schematic outline of T-PEN's basic functionality that Jon Deering, T-PEN's senior developer, put on the white board (right). It was exciting to start talking about the practical realities of a project we've been dreaming about for a few years now.

We've got lots to do, but there is going to be some fun work ahead.

Sunday, October 10, 2010

What is T-PEN?

T-PEN (Transcription for Paleographical and Editorial Notation) is a digital tool for scholars who use digital images of unpublished manuscripts that are housed in digital repositories throughout the world. T-PEN will provide a fully-equipped digital workspace in which the scholar -- while constantly viewing the manuscript images -- transcribes line by line, makes notes about problematic paleographic features, documents glosses and corrections or revisions to the manuscript, and may—either during transcription or after further research—add interpretative or bibliographic information pertaining to particular lines or larger sections of the text. With this tool, the transcribed text can also be immediately encoded with XML markup to indicate any given feature of the text (e.g., a rubric, colophon, gloss, lemma, correction, quire signature, citation, etc.).

This blog will track the 18 months of software development and testing we have planned. This project is being funded by both the National Endowment for the Humanities and the Andrew W. Mellon Foundation.