ARTICLE

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java

2024
ACM Transactions on Software Engineering and Methodology , 34 : 1-35

Lien de l'article : https://doi.org/10.1145/3688834

Discipline : Informatique et sciences de l'information

Auteur(s) : Wenkang Zhong, Chuanyi Li, Kui Liu, Jidong Ge, Bin Luo, Tegawendé F. Bissyandé, Vincent Ng

Auteur(s) tagués : BISSYANDE T. François D'Assise

Renseignée par : BISSYANDE T. François D'Assise

Résumé

Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate new improvements in this area and provide practical instructions for users. However, we observe two potential weaknesses in the current evaluation of NPR systems: ① published systems are trained with varying data, and ② NPR systems are roughly evaluated through the number of totally fixed bugs. Questions such as
what types of bugs are repairable for current systems
cannot be answered yet. Consequently, researchers cannot make target improvements in this area and users have no idea of the real affair of existing systems. In this article, we perform a systematic evaluation of the existing nine state-of-the-art NPR systems. To perform a fair and detailed comparison, we (1) build a new benchmark and framework that supports training and validating the nine systems with unified data and (2) evaluate re-trained systems with detailed performance analysis, especially on the effectiveness and the efficiency. We believe our benchmark tool and evaluation results could offer practitioners the real affairs of current NPR systems and the implications of further facilitating the improvements of NPR.

Mots-clés

Aucun mot-clé renseigné.

Retour Consulter l'article

Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java

Résumé

Mots-clés

1053

10632

49

127