Tuesday, November 30, 2010

Of lines and columns

TPEN’s current image parsing process begins by identifying columns, then proceeds to identify the lines of text within those columns. I worked with an early version of another tool this week which asks the user to draw bounding boxes around each line of text without specifying columns first. This method has an up side and a down side. The up side is that each individual line’s left and right boundaries are more precise than when they just inherit those values from a column. In the TPEN UI, we have to allow a reasonable amount (currently 4 times the mean height of lines on the page) of the image beyond the column boundaries to be displayed, to account for cases where a line extends outside the column, or the column is skewed. A precise bounding of each line doesn’t have this issue. The downside of not having columns is that it is not easy to properly order the lines for transcription, as you cannot just take the top line from the left most column, continue to the bottom of the column then move to the next column. This particular tool currently displays lines based solely on their vertical starting point, meaning in a 2 column document you will get the first line of one column, then the first line of the other.
Having seen the up and down sides of both of these methods, it seems a hybrid method would be best. Something that either groups individually boxed lines into columns, or something that searches for lines within columns, but then attempts to find the left and right boundaries for that particular line, and allows those values to be used in the UI without losing the line to column relationship. I am going to prototype the latter method this week, and see if it ends up being the best of both worlds.

Wednesday, November 24, 2010

Defining Digital Humanities

For the last month or so I've been on an evangelistic kick at my home institution promoting digital humanities.   It has been a lot of fun learning what my colleagues in the Humanities departments think when they hear the phrase "digital humanities."  Inevitably, the tables get turned on me and I am asked: what do you think Digital Humanities is?  This has heated up a bit in the last few days since the New York Times published an article on how Digital Humanities could unlock Humanities' riches.  Reading that article, along with the almost 100 comments and the response to that response by Martin Foys and Asa Mittman has prodded me further to reflect on defining on what I do. 

It is typical in both the humanities and the sciences to engage in research without first identifying its context in absolute terms.  Indeed, one should always be wary of any a priori definition of research fields, since the more accurate accounts of academic work come  out of praxis.  So I haven't been necessarily embarrassed that I had never properly defined what I do as a digital humanist.  But with the question dangling, I had to articulate my vision.  Here's what I have come up with:

"As a form of scholarly work digital humanities  offers methods and resources that can strengthen the established methods of humanities research.  It can also help make the boundaries between the constituent disciplines more porous and thus bring together different groups of scholars and students.  Digital humanities comes closest to how our students engage the world on a daily basis as consumers of digital information. 

"Such possibilities may seem incommensurate with the common (mis)perception of digital humanities namely that is defined solely by the task of digitization.   This is certainly a principal task of digital humanities, but it hardly accounts for all that it is.  In sum, digital humanities comprises three general tasks: preservation, aggregation and integration. To begin with, digital humanists engage in the digitization of existing artifacts as an act of preservation, from text to images to 3D objects.  Digitization projects, however,  preserve artifacts for the sake of access. There is no point in digitizing if does not change the scope of access.  Digitization ought to be about broadening access.   As for aggregation,  the methods of digital humanities allow for a wider consultation and analysis of large data sets through automation.  Even if computer algorithms assist in this analysis, it is based on the core methods of “pattern matching” that most humanists use: finding common ideas or words/phrases in a set of texts, matching textual accounts to other cultural artifacts or practices, etc.  Digital humanities permits this type of work to occur on a larger scale, and often supports complex forms of pattern matching.   This data, when brought together, can give the humanist scholar ample evidence to draw significant conclusions.  I would include here the role that "crowdsourcing" has come to play recently.  Many projects can achieve so many more goals with a large number of scholars and interested parties working collaboratively.    Finally, the ample evidence from the aggregrate sources is analyzed within an interdisciplinary context. On a very basic level, it encourages the integration of text and image, but it can also provide ideal opportunities to integrate different textual types (or different sets of images) in a way that assist the reader or user to engage the complexity without becoming confused.  Digital humanities can provide the virtual means to bring together disparate sources, ones that have never been connected before (and perhaps can never be physically present together), and provide for the humanist scholar the tools to develop a more complex picture of the topic under study.  It can also bring together disparate scholars which can open new ways to study and interpret cultural artifacts.  One example is how historians are using GIS technology to contextualize historical narratives within a specific geographical space (paying attention to meteorology and other environmental conditions)  or to map the path of documents as they traveled from reader to reader.

"Digital humanities can therefore breathe new life into the world of scholarship and teaching, without snuffing out how humanities scholars currently function.  Additionally, it can provide a mechanism for the critical evaluation of technology as both a pedagogical tool and a form of research.  As our culture demands immediate access to larger and larger sets of data, and also seeks ways to integrate that data in multivalent ways, digital humanities can assist students and professors in this complex, wired world."

I'm not suggesting that I have developed the definitive account of digital humanities.  Some of my fellow DHers might object to the way I privilege text -- although I employ a rather elastic notion of text as simply a container of information.   And, I'm sure I've not accounted for everything, but this is my modest contribution to trying to understand what we do as digital humanists and why we do it.

Tuesday, November 9, 2010

Recruiting a GUI Web Developer

T-PEN is currently advertising for a GUI Developer.  The advertisement is posted online and applications must be submitted online at jobs.slu.edu

Here are the basic details:

Web Developer (Theological Studies)
Job Summary: Under general direction, assists in the creation and deployment of a web-based digital tool for scholars working with digital images of unpublished manuscripts; performs Graphical User Interface (GUI) related duties with responsibility for the user interface.
May include any and/or all of the following:

  1. Participates as a member of a design team, working with the senior developer, project directors, and other personnel.
  2. Designs a set of user-centered, interactive Web pages to integrate various APIs that will comprise the digital tool.
  3. Contributes to testing and bug tracking of various iterations of the digital tool.
  4. Responds to formal usability testing and makes appropriate changes in implementation.
  5. Oversees the Center's Web pages and ensures they are up to date and functional.
  6. Performs other duties as assigned.
  • Knowledge of Web development, applications, and technology
  • Knowledge of HTML/CSS and Javascript
  • Knowledge of programming and graphic design
  • Project management skills
  • Interpersonal/human relations skills
  • Written and verbal communication skills
  • Ability to provide site management solutions
  • Ability to perform work that is technically oriented, working on various platforms, including Microsoft and Apple
  • Ability to recognize trends in Web development
  • Ability to maintain confidentiality
Education and experience equivalent to:
Bachelor's degree; supplemented with two (2) years related work experience

T-PEN's image

Branding in Digital Humanities is important, since it helps users easily identify a specific tool or methodology--especially if that tool interoperates or can be integrated into larger frameworks.  At T-PEN, we've started thinking about this, and a more serious attempt at branding our tool will be done in the next few months. For now, we offer this basic image



I’m Jim Ginther and I am the Principal Investigator for T-PEN.  I have been working in Digital Humanities for over a decade and T-PEN is my seventh research project in DH.  I am very excited about this project for two reasons.  First, T-PEN is one step closer to a major dream of mine to create an editing suite that assists the editor from the transcription stage; through editing, collating and annotation; to the final digital publication–all  in a digital workspace.   Given how many other teams are working towards this same dream, the T-PEN team is committed to interoperability:  we want to ensure that users can take their transcriptions and import them easily into other tools.  The second reason I am excited about T-PEN is that I am providing one of the regular use cases during development.  I will begin a critical edition of the Super Psalterium of Robert Grosseteste (ca. 1170-1253).  I have been studying the life and works of this English thinker since my doctoral studies.  Grosseteste was one of the few masters we know by name at the University of Oxford in the early thirteenth century  He was also a polymath. He was a leader in natural philosophy (in the areas we now identify as cosmology and mathematical physics), an outstanding theologian, and bishop of Lincoln.  His commentary on the Psalter is the last major work from his days at Oxford.  Editing this text is not without its challenges since its textual history is a complicated story.  During T-PEN’s development I will be transcribing one of the manuscript witnesses of this large text (well over 200K words in length!).  Given the unique character of the text, I will definitely be putting T-PEN through its paces. 

Monday, November 8, 2010

YouTube video of T-PEN's Basic Features

One of Tomas O'Sullivan's responsibilities as T-PEN's Research Fellow is to document the tool's development and feature set.  Here is his first video on YouTube which describes T-PEN's basic features.  The voice belongs to Tomas himself.

Tuesday, November 2, 2010


Hello! My name is Tomás O’Sullivan, and I am a research fellow on the T-PEN project.
I hail from Bantry, Co. Cork, Ireland, and hold degrees in Medieval History and in Theology from University College Cork and Mary Immaculate College, University of Limerick. I am currently based in the Department of Theological Studies and the Center for Digital Theology at Saint Louis University, where I cut my digital teeth over the last few years working on the Electronic Norman Anonymous Project with Jim Ginther and Jon Deering.
My research interests focus on the ecclesiastical culture of early medieval Ireland within its Insular and European contexts, with particular concentration on homilies, eschatology (conceptions of the end of the world and the afterlife) and hagiography (writings about the saints). My PhD dissertation examines a distinct collection of Insular homilies which survives in four manuscripts copied on the Continent in the ninth century; these manuscripts will form the basis for my test-case to run T-PEN through its paces as development proceeds. I’ll also function as the technical writer for the project, creating a user manual to accompany the final product.
Transcribing and editing the anonymous Latin homilies of the early Middle Ages is a daunting task, as these sermons were often composed from a variety of textual extracts and images which could be combined and recombined in a kaleidoscope of patterns; I’ve taken to using the phrase “microtexts in motion” to describe this situation where, very often, there is no such thing as a “stable” text. I’m excited about the possibility of using T-PEN’s automated encoding and personalized mark-up features to rein in these mobile microtexts. I’m confident that if T-PEN can help me tame these anonymous homilies, it should be able to handle anything!