Workshop: Using National Library of Estonia text collections

I gave a workshop in the Seminar of Digital Archives on the approach we are building at the National Library of Estonia to access text collections. The event was preceded by a day of talks at the Digital Humanities and Digital Archives with interesting talks on this topic.

The workflow used at the moment relies on Jupyter Notebooks opened on the same servers as the text files, and a few custom R commands to retreive the texts in a clean format. The custom R commands are presented as a package on github. The Jupyter Notebooks work with a username on the servers, we used temporary usernames at the workshop, more permanent usernames can be got when asking us.

Jypiter access example 1 Jypiter access example 2 Jypiter access example 3

The first tests look promising, texts can be retrieved fairly quickly and analysed with common tools.

The workshop materials were posted on hackmd. Some graphs made:

Georg Lurich vs Konrad Mägi - distinguishing words

Georg Lurich vs Konrad Mägi - distinguishing words The distinguishing words between texts containing Georg Lurich and Konrad Mägi 1886-1940.

Steam, Electricity, Horses in Estonia 1886-1940

Productive years of top artists The proportion of texts in Postimees 1886-1940 containing words related to steam, electricity, and horses.

Electricity and Appliances in Estonia 1886-1940

Sentiments in Aretha Franklin's songs Some selected examples of words containing “elekt” in Estonian and their distribution over time.

Sessioon A: RR digitaalarhiivis DIGAR olevate tekstide kasutamine (hands-on workshop) Mida saab teha digitaalsete tekstidega? Ligipääs Digari Eesti artiklitele avatud koodi kaudu. Lihtsam tekstitöötlus ja selle tulemused R-is. Kasutame RStudiot, juhend selle paigaldamiseks saadetakse registreerunutele. Töötuba ei eelda varasemaid programmeerimisoskusi. Sellest hoolimata kirjutame aga ise töötoas koodi.