Skriptoorium: fast and automated processing of text corpora and its application in archiving texts at the Department of Folkloristics, Estonian Literary Museum

Andres Kuperjanov

Since the entering of the first folklore materials in the early 1990s at the Department of Folkloristics, Estonian Literary Museum, the department has followed the input format suitable for mass processing of texts. By the present moment nearly 150,000 texts of different folklore genres and on different themes have been transcribed from manuscripts. At our current age of lauding databases, this particular solution offers a practical alternative, since content-oriented processing does not require a physical database. There is also no need for a complex tabulation of metadata and creating further descriptive data, since the entire processing is done on actual text and the required information is directly integrated in the corpus entry. The processor also includes the graphical output option emulating the basic functions of online databases. The digitized texts stored as archive files in the department’s digital archives are also easily convertible.

Skriptoorium is related to projects EKKM09-168 (Development and introduction of online applications of Estonian language, culture and folklore) and EKKM09-159 (Monumenta Antiquae legend and incantation anthologies) and the state-financed research project SF0030181s08 (Narrative aspects of folklore: power, personality and globalization).