Enhancing Linked Data Trustability and Transparency through Knowledge-driven Data Ecosystems

July 02, 2021 by Julia Holze

Prof. Dr. (Univ. Simón Bolivar) Maria-Esther Vidal leads the Scientific Data Management (SDM) group at TIB and the Leibniz Joint-Lab with the L3S Research Center, and she is a full-professor (retired) at Univ. Simón Bolívar (USB), Venezuela. She has contributed to the areas of data management, semantic data integration, and knowledge graphs. She coauthors over 180 peer-reviewed articles with an H-index (Google Scholar) of 28. She has been editorial board member of journals (e.g., JWS, JDIQ), general chair, co-chair, senior reviewer of major scientific events (e.g., WWW, ISWC, AAAI, AMW, CIKM. She serves as an expert in advisory boards, doctoral schools, several summer schools, and doctoral consortiums. She has advised more than 20 doctoral students and more than 120 Master's and bachelor's students in Computer Science. She was a senior scientist at the Fraunhofer Institute (IAIS) (2016-18), a visiting professor at various universities (Univ. Maryland, Univ. Nantes, the Karlsruhe Institute of Technology, Univ. Bonn, Polytechnic University of Madrid, and Polytechnic University of Catalunya). She is leading data management tasks in the EU H2020 projects iASiS, BigMedylitics, and QualiChain, and has participated in BigDataEurope, BigDataOcean. In the past, she has participated in international projects (e.g., FP7, NSF, AECI), and led industrial data integration projects for more than 10 years (e.g., Bell South, Telefonica). In this interview she speaks about her research areas and the upcoming DBpedia Day on 9th of September at the Semantics Conference. 

SEMANTiCS: You have been working on many areas of research (semantic web technologies, knowledge management and representations). What will shape our future the most? 

Maria-Esther Vidal: Knowledge graphs have gained momentum as expressive data structures to represent the convergence of data and knowledge spread across various data sources. Years of research on semantic data management and knowledge engineering have paved the way for merging numerous data sources into a vast and distributed KG, the Linked Open Data (LOD) cloud. Furthermore, techniques for query processing, data storage, and knowledge representation and discovery allow for using the vast amount of knowledge integrated in LOD. 

However, despite the acceptance of database management systems as crucial data processing tools for industrial and scientific database applications, the scenario is not necessarily the same for solutions that demand analytics against LOD. Real-world applications require a complete understanding of all the decisions made during data management. The absence of algorithmic methods to account for LOD transparency considerably affects trustability and prevents their full acceptance as reliable solutions for decision-making. The challenge is to make available semantic-based formalisms and tools that facilitate the documentation and interpretation of all the processes conducted to merge data into LOD, and to traverse and uncover potential novel patterns encoded in this significant source of knowledge.

SEMANTiCS: What are your research areas? How do they contribute to the advancement of Data Science?

Maria-Esther Vidal: My research goals are:

  1. Develop computational methods to integrate, curate, publish and access heterogeneous data sources into knowledge graphs.
  2. Define formalism and implement tools to enhance transparency and explainability during the knowledge graph creation process.
  3. Develop query processing methods to access and validate federations of knowledge graphs.
  4. Devise Artificial Intelligence techniques to uncover patterns and associations in knowledge graphs. 

My contribution to data science is two-fold. We investigate computational methods for efficiently managing data during the processes of knowledge graph creation and management. Additionally, to enhance transparency and ensure the responsible management of data, we research knowledge-driven approaches to document and trace all the steps required to transform heterogeneous data into actionable knowledge. 

SEMANTiCS: Lately, you have been working in the Health Domain. Can you elaborate how you will introduce the concept of Knowledge-driven Data Ecosystems in this domain? Any lessons learned or available tools?  

Maria-Esther Vidal: The complexity of health and biomedical systems brings challenges at data management and knowledge representation levels. The wide variety of biomedical concepts demands expressive formalisms to precisely model their properties, associations, and taxonomic relationships and capture the evolution of these concepts over time. Furthermore, the vast amount of data sources present in a myriad of formats data about biomedical concepts demands novel data integration techniques. Additionally, access to biomedical data, and, in particular, health data, is usually controlled by complex regulations. More importantly, real-world applications in this domain demand a complete understanding of all the decisions made during all data management steps. 

Interested in joining DBpedia Day at SEMANTiCS 2021 in Amsterdam?Register for SEMANTiCS 2021!

Data ecosystems represent keystone-player or alliance-driven decentralized infrastructures to enable effective and safe data sharing and exchange among independent stakeholders. They can be equipped with metadata, common regulations, data services, metadata, and business models and provide the basis for transparent data management. This makes knowledge-driven ecosystems promising infrastructures for building traceable biomedical data-driven pipelines and trustable solutions to support decision-making. 

We have furnished knowledge-driven data ecosystems in the biomedical domain. They enable sharing biomedical data across several stakeholders and are equipped with data management methods to transform data collected in different formats into knowledge graphs. Semantic data models and ontologies provide the meaning of the data sources exchanged in these data ecosystems. Additionally, mapping rules represent declarative definitions of the data sources in terms of unified schema and biomedical ontologies, while W3C standards like SHACL enable the verification of domain-specific integrity constraints. 

These empowered data ecosystems facilitate the traceability of the data management processes required to transform heterogeneous data into a structured knowledge graph. The stakeholders actively contribute and interact with the data ecosystems, and trustworthiness has been enhanced. Despite these positive outcomes, various challenges need to be addressed to adopt data-driven tools in domains like biomedicine fully.  A complete understanding of the data characteristics and all the decisions made during data management and artificial intelligence (AI) tools are mandatory for fully adopting these frameworks as reliable solutions for decision-making. Thus, a paradigm shift in semantic data integration towards explainable data-driven AI is still demanded. 

SEMANTiCS: What are your expectations about Semantics 2021 in Amsterdam and DBpedia Day? 

Maria-Esther Vidal: Semantics and the DBpedia Day are venues that researchers and practitioners of semantic technologies meet to discuss semantics in real-world data management and AI problems. Given the potential of knowledge-driven data ecosystems, both events represent excellent venues for discussing the new challenges that realizing trustable data ecosystems bring into the semantic web. I am looking forward to fruitful discussions and collaborations.