TEI + Python + lxml + Dutch = Corpus Toneelkritiek Interbellum

I was pleased to be able to assist with the Corpus Toneelkritiek Interbellum project, which allows reading, browsing and searching of early 20th-century Dutch theater reviews. I can’t read Dutch, but Google’s automated translation tells me that the review of Hamlet mentions a “long modern clown,” which sounds disturbing enough that I’ll leave the actual reading to someone else.

Image may be NSFW.
Clik here to view.

The source documents are encoded in TEI XML and rendered to the browser using Python and lxml, three of my favorite technologies.

There are a few take-aways from this project that might benefit anyone working in a similar area and scale:

Use a standard encoding format (in this case TEI, but choose an appropriate one based on the source content)
Use a modern programming language, even in a humanities context (e.g. Python)
Use modern XML parsing tools (e.g. lxml + XPath + XSLT)

The key advantage of libraries such as lxml in publishing and digitization projects is that it allows the developer to freely mix XML-native languages like XPath and XSLT with the expressive, procedural programming style of Python. I’m still amazed by how many people are “parsing” XML using regular expressions (or worse), or using plain CGI/Perl scripts to serve up content. There are easier ways!

“Free” doesn’t have to mean primitive. In fact I would argue that projects like Pinax can jump-start library or digital archive sites into the 21st century with less work than a grad student will spend crafting a bespoke Perl script.

Congratulations to Thomas Crombez and his team!

TEI + Python + lxml + Dutch = Corpus Toneelkritiek Interbellum

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112