Rebecca Josina Kahn

Towards a Sustainable Linked Data Infrastructure for Museum Collections

The digitisation of cultural heritage data is creating previously unimagined possibilities for humanities researchers. Museum, archive and library materials are now available online and the growth of large-scale digital research infrastructure projects allow scholars to overcome many of the logistical challenges of using multiple primary sources, such as the fragmentation and geographic dispersal of collections. Meanwhile, the semantic web offers the promise of a method in which computers are able to recognise and create the links between objects, texts and other sources. Many cultural heritage collections have embraced the semantic web as a mechanism for opening up their databases, allowing collections to become interconnected, and creating networks of highly complex, rich and heterogenous data.

However, many museums face significant technical and logistical challenges when trying to create Linked Open Data (LOD). The highly contextualized nature of humanities sources and the need to make allowance for complexity and ambiguity when interpreting them, mean that the rigid data model of the semantic web does not always provide enough contextual data for the complex evidentiary reasoning that takes place in humanities research. At the same time, researchers using this data have to grapple with data of inconsistent quality and depth. The recent COVID-19 pandemic has also highlighted the need for resilient, interoperable frameworks which allow researchers to access data sources remotely, with confidence in the data’s completeness and complexity.

TSLIM aims to address these difficulties for both producers and users of data by developing an evaluation framework with a specific focus on the sustained linking of ethnographic materials. This will be informed through detailed study of both the creation and exploitation of LOD in cultural heritage collections and will provide the basis for developing a model for expressing complex cultural heritage data online.

At the same time, the research will also contribute to increasingly urgent discussions about how to manage this vast and growing body of data, by conducting a nuanced and rigorous study of both LOD creation and exploitation. In a context where searching for digital content is increasingly taken to mean ‘just Google it’, it will be crucial for the data producers (such as museums) and data consumers (such as researchers) of the future to be able to find what they need. To do this, it is essential to devise new approaches to how the data is stored, categorised and made available. These approaches will need to be both technical and theoretical, in order to address the conceptual and practical realities of how objects and data are to be persistently linked across the web. In order to do this, TSLIM has been designed as a deliberately interdisciplinary project, which will combine both quantitative and qualitative enquiries into how cultural heritage institutions make use of the semantic web, and the ontological underpinnings of how semantic data is described. It will involve collaborative work with scholars across a range of disciplines as well as museum professionals, and will focus specifically on outputs that are practicable and useful to both of these stakeholder groups, in order to ensure real-world value to both.

The use of the semantic web in the humanities is still in its infancy, and this research would enable an intervention at a crucial early stage of the development of a new field of interdisciplinary research, which brings together computer scientists, humanities researchers and data scientists. A richly described web of heritage data, aggregated from museums all over the world, has the potential to transform the ways in which we represent and share knowledge. Humanities scholarship will benefit from access to deeper, thicker descriptions of objects, entities and events.