A framework for integrating biomedical knowledge in Wikidata with open biological and biomedical ontologies and MeSH keywords

Abstract

This study presents a comprehensive framework to enhance Wikidata as an open and collaborative knowledge graph by integrating Open Biological and Biomedical Ontologies (OBO) and Medical Subject Headings (MeSH) keywords from PubMed publications. The primary data sources include OBO ontologies and MeSH keywords, which were collected and classified using SPARQL queries for RDF knowledge graphs. The semantic alignment between OBO ontologies and Wikidata was evaluated, revealing significant gaps and distorted representations that necessitate both automated and manual interventions for improvement. We employed pointwise mutual information to extract biomedical relations among the 5000 most common MeSH keywords in PubMed, achieving an accuracy of 89.40 % for superclass-based classification and 75.32 % for relation type-based classification. Additionally, Integrated Gradients were utilized to refine the classification by removing irrelevant MeSH qualifiers, enhancing overall efficiency. The framework also explored the use of MeSH keywords to identify PubMed reviews supporting unsupported Wikidata relations, finding that 45.8 % of these relations were not present in PubMed, indicating potential inconsistencies in Wikidata. The contributions of this study include improved methodologies for enriching Wikidata with biomedical information, validated semantic alignments, and efficient classification processes. This work enhances the interoperability and multilingual capabilities of biomedical ontologies and demonstrates the critical role of MeSH keywords in verifying semantic relations, thereby contributing to the robustness and accuracy of collaborative biomedical knowledge graphs.

Khalil Chebil
Khalil Chebil
Assistant professor in computer science

My research interests include decision aid algorithms, optimization and linear programming.

Mohamed Ali Hadj Taieb
Mohamed Ali Hadj Taieb
Assistant professor

My research interests include semantic similarity, semantic relatedness, knowledge representation, Big Data, social media, data management systems and graph embedding.

Mohamed Ben Aouicha
Mohamed Ben Aouicha
Professor

My research interests concern information retrieval, semantic technologies, social media analytics, knowledge representation, Big Data and graph embedding.