This was posted as part of a series of blogposts by our team during the Digital Humanities Hackathon 2019 in Helsinki. Original post location here. I heartily recommend DHH as an event to attend!

DH Helsinki Hackathon is a rollercoaster

A quick update on the Genre and Style group in the DHH19. The discussion sessions yesterday ran late, so this post arrives with a small delay.

Day 3 of the hackathon (or day 2 of the blog) has been a great example of why these gatherings can lead to unexpected results — we started the day with one question in mind, but along the way decided on something quite different. In our group we are trying to use digital humanities tools to study the variations in genre and style in 18th century digitized English texts.

On Friday, we had to come up with a formulated research question and a plan for our research based on this. A well-formulated plan would naturally keep us in focus, and help us do solid and reasonable work during the week.

Our task for Friday

This is not of course how our team works! :) We arrived this morning diligently — at 9.15 AM — to study the genres in relation to gender. However by 10.15 we had found out that out of all the texts available to us, a very small proportion was written by women (less than 5%). This would make it quite difficult to compare with other texts.

Additionally, we have been made aware that the metadata on the texts is pretty simple and unreliable, and the OCR quality of the texts makes it difficult to do many types of comparisons (although many analyses work surprisingly well even for bad quality texts).

So, we started thinking once again, how could we compare these texts and how could we get to the genre within these texts. Now we already knew each other a bit, what we were interested in, and what we could do, and could more comfortably discuss an interesting topic to study.

So, by 11.15 we came up with a different topic — what if we could look at 18th century texts as a type of network, with similar texts connected to each other. Texts on the same topics would mention the same people and would be discovered by these means. In this case, we could talk about genres independently of the metadata that had uncertain quality and were marked up quite unevenly.

Named Entity Recognition finds entities from text through various heuristics and algorithms.

By 12.15 we were testing Named-Entity-Recognizers on the old texts, and by 13.15 we had almost forgotten to eat lunch because of it. Eventually we came up with a plan that seemed to make sense from a humanities perspective, seemed to be feasible technologically, and most importantly seemed within reach of our group and interesting enough to try to do.

So, by 14.15, we had come up with a research strategy and ran initial tests, and by 15.15, we were ready with our slides for a presentation. We got some tough (but necessary) questions from the instructors and the audience, and got a good way to move forward. Now, unless basic steps of the plan fail, we would each have something interesting to do with the data.

Our research question.

We planned to have a working meeting after the presentations at 16.15, but walking through the outdoors even briefly — it was 18 degrees of warmth outside (i.e. feels like 30!), this turned into a meeting in the park, which gradually moved into more informal discussions (see illustration below).

The overtime working team avoiding being captured on the photo.

So, within just a few hours, we explored a lot of data, came up with another research plan, formulated it, and made plans for next week. Having gotten to know each other over some time already, discussing also plans and topics has become easier and easier — also looks like we should get some very interesting results!

So, turbulent times in the hackathon. Catch up with us here next week* with more info!

Kirnu roller coaster in Linnanmäki amusement park in Helsinki

*- Technically, even this post is cheating, since we have been given strict instructions not to do any work during the weekend. However, as we prepare our minds for the week ahead, it’s good give a quick overview of where we got.

This blog post was written by Peeter Tinits, a last year PhD student at Tallinn University and a digital humanities grunt in University of Tartu, in Estonia. Attending the hackathon from abroad for the learnings and the funs.