Dr. Mohamed Ali Hadj Taieb University Habilitation defense
Title: Semantic integration and GNN for social data analytics under big data environment.Defense Date: 11 January 2025This research delves into the complexities and dynamics of data generated by Online Social Networks (OSNs) in the digital age, addressing challenges posed by Big Data and its integration from multiple platforms. This field of research was driven by an urgent demand to advanced and sophisticated techniques for managing and analysing the vast, varied, and rapidly evolving data they generate.In the digital landscape, characterized by the evolution of data generation and management, OSNs like Facebook, Twitter, and Instagram have become primary sources of massive datasets. These platforms present intricate challenges in data standardization, interpretation, and analysis, especially given the diverse and unstructured nature of OSN data. Traditional data processing methods fall short in managing the volume, variety, and velocity of this data, underlining the core motivation of this research: to devise more efficient and effective data handling and analysis techniques.The research confronts multifaceted problematics inherent in Big Data from OSNs, such as the complexity in analysing unstructured and varied data, integration challenges across different platforms with unique formats and user behaviours, ethical and privacy concerns in data utilization, and the necessity for scalability and real-time processing in the fast-paced social media environment.The study makes several key contributions to the field. It includes comprehensive systematic reviews on network representation learning and the Big Data pipeline, establishing the foundation of the research. The Social Network ontology OWL (SNOWL), aimed at the semantic integration of data across multiple social networks, facilitates comprehensive data management and analysis. Additionally, the BigRDF platform, engineered for the semantic integration of large-scale social data, emphasizes high availability and employs Infrastructure as Code principles, showcasing a significant stride in Big Data architectures. The development of the BIGUI architecture which it is considered as an advanced framework for managing large social network graphs with a focus on anchor users. The OVRAU model, based on deep learning techniques, is innovatively designed to learn latent vector representations of anchor users, incorporating both intra and inter-structural features from various OSNs.The research is structured into four comprehensive chapters, each delving into a specific aspect of OSN data. The first chapter explores OSN data modelling, including user profiling, network and relationship modelling, content representation, and graph-based models. The second chapter is dedicated to the development and evaluation of the SNOWL ontology for semantic data integration. The third chapter introduces the OVRAU model for advanced user modelling across different OSNs. The final chapter discusses Big Data architectures such as BIGUI and BigRDF, essential for managing and analysing social network data, highlighting key features like fault tolerance, distributed architectures, scalability, and high availability.In conclusion, the research synthesizes findings in semantic data integration, graph neural networks, and Big Data frameworks within the context of social data analytics. Looking forward, the study aims to explore advanced data management capabilities within modern decentralized and distributed data architectures such as Data Mesh and Data Fabric, and refining maturity models for organizational data-driven status. This future work aspires to contribute substantially to the field of data science, targeting efficient data architectures and facilitating advanced, data-driven decision-making processes. Overall, this research represents a comprehensive and in-depth study of the challenges and advancements in managing and analysing the complex and voluminous data generated by OSNs, offering valuable insights and directions for future research in this rapidly evolving field.