Combining Automatic Annotation with Human Validation for the Semantic Enrichment of Cultural Heritage Metadata
Abstract
The addition of controlled terms from linked open datasets and vocabularies to metadata can increase the discoverability and accessibility of digital collections. However, the task of semantic enrichment requires a lot of effort and resources that cultural heritage organizations often lack. State-of-the-art AI technologies can be employed to analyse textual metadata and match it with external semantic resources. Depending on the data characteristics and the objective of the enrichment, different approaches may need to be combined to achieve high-quality results. What is more, human inspection and validation of the automatic annotations should be an integral part of the overall enrichment methodology. In the current paper, we present a methodology and supporting digital platform, which combines a suite of automatic annotation tools with human validation for the enrichment of cultural heritage metadata within the European data space for cultural heritage. The methodology and platform have been applied and evaluated on a set of datasets on crafts heritage, leading to the publication of more than 133K enriched records to the Europeana platform. A statistical analysis of the achieved results is performed, which allows us to draw some interesting insights as to the appropriateness of annotation approaches in different contexts. The process also led to the creation of an openly available annotated dataset, which can be useful for the in-domain adaptation of ML-based enrichment tools.