DH Curation Guide presentations
Slide deck (PDF)
In 2010-12, I was involved with the DH Curation Guide, a wonderful community resource guide to data curation in the digital humanities.
Slide from the presentation deck
Slide deck (PDF)
September 27, 2011
Graduate School of Library and Information Science, University of Illinois Urbana-Champaign
I spoke about our text modeling choices and our publishing workflow in my Electronic Publishing class, taught by the inimitable Julia Flanders.
Notes & script for Text Modeling presentation:
Outline of presentation
- My involvement as UI designer & co-editor
- Intro to UI of DHCuration
- Sidenote review of HTML and CSS
- XSLT: what is it for? How do I write it? Close look at code (+ application sharing?)
- Why push article content from XML to XSLT to HTML? What flexibilities?
- Modeling choices like FAQ and Glossary as straight-up HTML
[screenshot of DHCuration main page]
Intro my involvement
I'm currently a research assistant within DCEP-H, where one of the projects is this guide to data curation in the digital humanities. I am a co-editor with Trevor and Julia, and right now my main priorities are developing the user interface and planning strategies for the launch and the site's sustainability.
I became involved last year as an hourly, purely as a UI designer because I've worked a lot in the past with HTML and CSS as well as graphic design tools like the Adobe Creative Suite. In undergrad, I was a web designer for grant-funded humanities projects, so I had a background with academic user interfaces. But this project, the Guide, was more complex than what I'd worked with before.
Intro to UI
The site will contain around a dozen articles that are half general intro to a topic, and half portal to other resources. These pages are encoded using a version of the TEI that has been customized by Trevor and Julia. They're the ones who have masterminded the XML.
[screenshot of article]
Here's what the user interface for the articles in the project currently looks like. It's not quite polished and it's a little bare-bones, and you can blame my love of minimalism for that.
I should also mention that we're using off-the-shelf commenting software called Disqus, and I'll talk a little bit about how we're using that later.
So, scroll around the page, you might be able to see how the article has been split up: into sections, paragraphs, resources, and groups of resources. All of this is described by the XML. And so is stuff like glossary terms, article authors, the titles of sections, etc., those are all elements in the TEI-based XML schema that Trevor and Julia wrote.
But this document is in HTML. And all the pretty stuff, like the slight shadow behind section titles and the yellow boxes of the comments, that's all in CSS. So how do we get from this, the XML, to this, the pretty page? [XML, arrow, HTML] Well, I use XSLT. But first, let me take a sidenote and talk briefly about HTML and CSS.
So, XSLT. I began teaching myself XSLT with help from Kevin Trainor. - Jeni Tennison book -
XSLT stands for Extensible Stylesheet Language Transformations. It's an XML-based language. Its main use is to take XML documents and rearrange their contents for a different output.
[diagram of xml-xslt-html]
XSLT can process XML documents to make other XML documents, or to make PDFs, or in our case, to take the character data from the XML-encoded articles and turn that into HTML and CSS. It's just one long document full of rules to rearrange XML content. And all we need is one document to transform many documents in a uniform way.
It can transform documents in a very automated way, as in on the fly — you upload your XML documents and it's immediately spit out online into HTML. But we have a more small-scale model, with static documents. So I, by hand, take every XML document we have and run it through the XSLT processor and upload the resulting HTML file to the site.
[3 big screenshots]
So, I'm doing everything in oXygen — checking on the XML, but also writing the XSLT, HTML, and CSS. oXygen has built-in processors that are a kind of behind-the-curtain machinery for getting the transformation done.
Here's some example closeups of sections of code from each language that deal with the same content: the author's name and affiliation and the article title. I've color-coded it to make it easier to see. Each snippet of text is described in the XML — this is the family name, this is the given name. Then the XSLT document says, for any XML that comes in here, any values marked family name go here or there. They're values selected by the path — by where in the XML tree the values are. Once we press the button and hit 'transform', it spits out an HTML document with these values in the places the XSLT defined. You can see it's in a different order than in the XML. And finally, this is how it looks in your browser.
[marked up Intro screenshot]
XSLT can do a lot of different things though. It's really a programming language that allows us to control the XML content very tightly. We're also using it to give each paragraph a unique identifier, which is then used to attach a commenting thread to. We want every element of the page to be commentable — the whole article, each section, each paragraph. Which means that we need about 40 comment threads to load on the same page, each for the right object, which is ensured by the unique and persistent identifiers that XML and XSLT can set from the get go. This is important because we've also numbered the paragraphs on the page. These numbers are separate from the paragraph identifier, because the author might go back later and add or delete a paragraph. Renumbering 40 paragraphs can get tedious, so XSLT just does it automatically. But we still want the comments on a certain paragraph to stick with that one. I hope that isn't terrible confusing. In a nutshell, this system we have going allows for flexibility of representation and convenience of editing the content.
But what is the point of doing all of this? Trevor talked about the value of publishing in XML, and I'll just emphasize this. We could just as easily do this straight-up in HTML and CSS without having to do anything in XML and XSLT. Well, this is true, but it would certainly not be quite as flexible or convenient. In the model we're using, the article content can be poured into many different kinds of output with minimal effort. Describing the data of the document with meaning adds value to the data, because it can be easily reused and retooled in the future.