In a context of increasing digitalisation of administrative processes, cybersecurity has become a strategic issue for states, particularly Burkina Faso. Unfortunately, there is a lack of research into cybersecurity in Burkina Faso. In this article, we present an approach for identifying vulnerabilities in applications and websites from Burkina Faso’s cyberspace according to the OWASP Top 10 2021. Implementing this approach enabled us to collect 241 websites and web applications from various fields. Analysing the security risks of a sample of 20 websites and web applications identified 18,521 web vulnerabilities, forming the basis of a dataset called "BF-WeakWeb-2025". Six of the CWE identifiers found are listed among the 2024 CWE Top 25 most dangerous Software Weaknesses. To the best of our knowledge, this is the first study of web vulnerability analysis based on the OWASP Top 10 in Burkina Faso’s cyberspace. This dataset addresses the inadequacy and obsolescence of existing datasets in the field of cybersecurity. The dataset was used to fine-tune three Large Language Models (LLMs) — BERT, Llama, and Flan-T5 — to detect and classify the six CWE identifiers: CWE-693, CWE-79, CWE-1021, CWE-352, CWE-264, and CWE-89. Analysis of the results shows that the fine-tuned models correctly classify CWE identifiers with an accuracy rate of 98%, and are also robust against unbalanced data. This demonstrates the quality of the dataset.
Knowledge engineering, Analytical models, Large language models, Cyberspace, Benchmark testing, Aging, Software, Data models, Communications technology, Computer security, OWASP Top 10, Web vulnerablity scanners, Fine-tuning, Large Language Model, Web application attacks