Digitization of the collections at Ømålsordbogen – the Dictionary of Danish Insular Dialects: challenges and opportunities
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
Digitization of the collections at Ømålsordbogen – the Dictionary of Danish Insular Dialects: challenges and opportunities. / Hovmark, Henrik; Gudiksen, Asgerd.
In: CEUR Workshop Proceedings, Vol. 2084, 03.04.2018, p. 341-348.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Digitization of the collections at Ømålsordbogen – the Dictionary of Danish Insular Dialects: challenges and opportunities
AU - Hovmark, Henrik
AU - Gudiksen, Asgerd
N1 - Conference code: 3
PY - 2018/4/3
Y1 - 2018/4/3
N2 - Ømålsordbogen (the Dictionary of Danish Insular Dialects, henceforth DID) is an historical dictionary giving thorough descriptions of the dialects, i.e. the spoken vernacular of peasants and fishermen, on the Danish isles Seeland, Funen and surrounding islands. It covers the period from 1750 to 1950, the core period being 1850 to 1920. Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s. The project is currently undergoing an extensive process of digitization: old, outdated editing tools have been replaced with modern (database, xml, Unicode), and the old, printed volumes have been extracted to xml as well and are now searchable as a single xml file. Furthermore, the underlying physical data collections are being digitized.In the following we give a brief account of the latter digitization process, involving the physical collections, and we discuss a number of questions and dilemmas that this process gives rise to. The collections underlying the DID project comprise a variety of sub-collections characterized by a large heterogeneity in terms of form as well as content. The information on the paper slips is usually densified, often idiosyncratic, and normally complicated to decode, even for other specialists. The digitization process naturally points towards web publication of the collections, either alone or in combination with the edited data, but it also gives rise to a number of questions. The current digitization process being very basic, only very few metadata (1-2 or 3) can be added during the scanning process, we point to the obvious fact that web publication of the collections presupposes an addition of further, carefully selected metadata, taking different user needs and qualifications into account. We also discuss the relationship between edited and non-edited data in a publication perspective. Some of the paper slips are very difficult to decipher due to handwriting or idiosyncratic densification and we point out that web publication in a raw, i.e. non-edited or non-annotated form, might be more misleading than helpful for a number of users.
AB - Ømålsordbogen (the Dictionary of Danish Insular Dialects, henceforth DID) is an historical dictionary giving thorough descriptions of the dialects, i.e. the spoken vernacular of peasants and fishermen, on the Danish isles Seeland, Funen and surrounding islands. It covers the period from 1750 to 1950, the core period being 1850 to 1920. Publishing began in 1992 and the latest volume (11, kurv-lindorm) appeared in 2013 but the project was initiated in 1909 and data collection dates back to the 1920s and 1930s. The project is currently undergoing an extensive process of digitization: old, outdated editing tools have been replaced with modern (database, xml, Unicode), and the old, printed volumes have been extracted to xml as well and are now searchable as a single xml file. Furthermore, the underlying physical data collections are being digitized.In the following we give a brief account of the latter digitization process, involving the physical collections, and we discuss a number of questions and dilemmas that this process gives rise to. The collections underlying the DID project comprise a variety of sub-collections characterized by a large heterogeneity in terms of form as well as content. The information on the paper slips is usually densified, often idiosyncratic, and normally complicated to decode, even for other specialists. The digitization process naturally points towards web publication of the collections, either alone or in combination with the edited data, but it also gives rise to a number of questions. The current digitization process being very basic, only very few metadata (1-2 or 3) can be added during the scanning process, we point to the obvious fact that web publication of the collections presupposes an addition of further, carefully selected metadata, taking different user needs and qualifications into account. We also discuss the relationship between edited and non-edited data in a publication perspective. Some of the paper slips are very difficult to decipher due to handwriting or idiosyncratic densification and we point out that web publication in a raw, i.e. non-edited or non-annotated form, might be more misleading than helpful for a number of users.
KW - Faculty of Humanities
KW - digital humanities
KW - cultural heritage
KW - dialect dictionaries
KW - digitization
KW - metadata
M3 - Conference article
VL - 2084
SP - 341
EP - 348
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
SN - 1613-0073
Y2 - 7 March 2018 through 9 March 2018
ER -
ID: 195194809