IDOL: Comprehensive & Complete LOD Insights

Research & Innovation

Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse.

In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we enrich existing dataset metadata with our approach and republish them through an SPARQL endpoint.

Speakers:

Ciro Baron Neto

Leipzig University
http://bis.informatik.uni-leipzig.de/en/Welcome

Dimitris Kontokostas

Fondazione Bruno Kessler, Leipzig University
www.fbk.eu

Gustavo Publio

Leipzig University
http://bis.informatik.uni-leipzig.de/en/Welcome

Diego Esteves

AKSW, Universität Leipzig
www.aksw.org

Diego Esteves is currently a PhD student at the AKSW - Smart Data Analytics Research Group. Before starting at the AKSW group in Leipzig he worked for over 10 years in large companies such as Accenture, B2W Inc., Wilson Sons and BTG Pactual Investment Bank.

Amit Kirschenbaum

Sebastian Hellmann

Access the Recording and Slide Deck?

As a registered participant, you got a login to access the recording and slide deck. You may also purchase an on-demand ticket (36,- incl. VAT).

Search form