Optimizing Elasticsearch search experience using a thesaurus
Authors
Di Pretoro, Emmanuel
De Roock, Edwin
Fremout, Wim
Buelinckx, Erik
Buyle, Stephanie
Van der Stede, Véronique
Discipline
Computer and information sciences
Audience
Scientific
Date
2021Publisher
Code4Lib Journal
Metadata
Show full item recordDescription
The Belgian Art Links and Tools (BALaT) (http://balat.kikirpa.be/) is the continuously expanding online documentary platform of the Royal Institute for Cultural Heritage (KIK-IRPA), Brussels (Belgium). BALaT contains over 750,000 images of KIK-IRPA’s unique collection of photo negatives on the cultural heritage of Belgium, but also the library catalogue, PDFs of articles from KIK-IRPA’s Bulletin and other publications, an extensive persons and institutions authority list, and several specialized thematic websites, each of those collections being multilingual as Belgium has three official languages. All these are interlinked to give the user easy access to freely available information on the Belgian cultural heritage. During the last years, KIK-IRPA has been working on a detailed and inclusive data management plan. Through this data management plan, a new project HESCIDA (Heritage Science Data Archive) will upgrade BALaT to BALaT+, enabling access to searchable registries of KIK-IRPA datasets and data interoperability. BALaT+ will be a building block of DIGILAB, one of the future pillars of the European Research Infrastructure for Heritage Science (E-RIHS), which will provide online access to scientific data concerning tangible heritage, following the FAIR-principles (Findable-Accessible-Interoperable-Reusable). It will include and enable access to searchable registries of specialized digital resources (datasets, reference collections, thesauri, ontologies, etc.). In the context of this project, Elasticsearch has been chosen as the technology empowering the search component of BALaT+. An essential feature of this search functionality of BALaT+ is the need for linguistic equivalencies, meaning a term query in French should also return the matching results containing the equivalent term in Dutch. Another important feature is to offer a mechanism to broaden the search with elements of more precise terminology: a term like “furniture” could also match records containing chairs, tables, etc. This article will explain how a thesaurus developed in-house at KIK-IRPA was used to obtain these functionalities, from the processing of that thesaurus to the production of the configuration needed by Elasticsearch.
Citation
Emmanuel Di Pretoro, Edwin De Roock, Wim Fremout, Erik Buelinckx, Stephanie Buyle & Véronique Van der Stede, 'Optimizing Elasticsearch search experience using a thesaurus', in : Code4Lib Journal, 51 ( 2021-06-14), online, URL : https://journal.code4lib.org/articles/15749 (accessed 15/06/2021)
Identifiers
issn: 1940-5758
Type
Article
Peer-Review
Yes
Language
eng