Towards a Semantic Architecture for Data Lakes
- Springer Nature Switzerland : 105-116
Résumé
Data lakes are a storage system for large volumes of raw heterogeneous data, adopting an area-based architecture. The main challenge of this architecture is the extraction and storage of raw data without any content monitoring, making data processing and access difficult. In this paper, we propose the integration of an ontology into this architecture, particularly in the data extraction and access areas. In the data extraction area, the role of the ontology is to eliminate ambiguities and terminological confusion to ensure reliable data extraction. In the data access area, the ontology will transform simple queries into semantic data access queries. This architecture improves the data preparation and data access stages, which are essential for leveraging stored data
Mots-clés
Data lake , Architecture, Ontology Area Semantic