Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier

Mohammad Azzeh; Ali Bou Nassif; Manar Abu Talib; Hajra Iqbal

doi:10.37190/e-Inf240103

EISEJ

ISSN (electronic): 2084-4840
ISSN (print): 1897-7979
ISBN: 978-83-7493-305-6
DOI: 10.37190/e-inf
DOI (before 2020): 10.5277/e-informatica

Our journal works under a Creative Commons Attribution 4.0 International License.

Impact Factor (2022) = 0.8 New!
5-Year Impact Factor (2022) = 0.9 New!

Scopus CiteScore (2022) = 2.1
Scopus SNIP (2022) = 0.538
Index Copernicus Value (2022): 121.90

The journal is published under the auspices of the Software Engineering Section of the Committee on Informatics of the Polish Academy of Sciences and Wrocław University of Science and Technology.

The Journal is co-financed under the program “Development of scientific journals” from the funds of the Minister of Education and Science, contract no RCN/SP/0219/2021/1

Indexed by:

About

Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier

2024
[1]	Mohammad Azzeh, Ali Bou Nassif, Manar Abu Talib and Hajra Iqbal, "Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier", In e-Informatica Software Engineering Journal, vol. 18, no. 1, pp. 240103, 2024. DOI: 10.37190/e-Inf240103. Download article (PDF)Get article BibTeX file

Authors

Mohammad Azzeh, Ali Bou Nassif, Manar Abu Talib, Hajra Iqbal

Abstract

Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing.

Aim: Although many machine learning algorithms have been used to classify software modules based on static code metrics, the k-Nearest Neighbors (kNN) method does not greatly improve defect prediction because it requires careful set-up of multiple configuration parameters before it can be used. To address this issue, we used the Non-dominated Sorting Genetic Algorithm (NSGA-II) to optimize the parameters in the kNN classifier with favor to improve SDP accuracy. We used NSGA-II because the existing accuracy metrics often behave differently, making an opposite judgment in evaluating SDP models. This means that changing one parameter might improve one accuracy measure while it decreases the others.

Method: The proposed NSGAII-kNN model was evaluated against the classical kNN model and state-of-the-art machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) classifiers.

Results: Results indicate that the GA-optimized kNN model yields a higher Matthews Coefficient Correlation and higher balanced accuracy based on ten datasets.

Conclusion: The paper concludes that integrating GA with kNN improved defect prediction when applied to large or small or large datasets.

Keywords

software defect prediction, genetic algorithm, multi-objective optimization, k-nearest neighbor

References

1. Z. Xu, J. Liu, X. Luo, Z. Yang, Y. Zhang et al., “Software defect prediction based on kernel PCA and weighted extreme learning machine,” Information and Software Technology, Vol. 106, 2019, pp. 182–200.

2. T. Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman et al., “Local versus global lessons for defect prediction and effort estimation,” IEEE Transactions on Software Engineering, Vol. 39, No. 6, 2013, pp. 822–834.

3. P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Information and Software Technology, Vol. 59, 2015, pp. 170–190.

4. J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan, “Heterogeneous defect prediction,” IEEE Transactions on Software Engineering, Vol. 44, No. 9, 2018, pp. 874–896.

5. T. Zimmermann, R. Premraj, and A. Zeller, “Predicting defects for eclipse,” in Third International Workshop on Predictor Models in Software Engineering, PROMISE ’07, 2007.

6. C. Tantithamthavorn, A.E. Hassan, and K. Matsumoto, “The impact of class rebalancing techniques on the performance and interpretation of defect prediction models,” IEEE Transactions on Software Engineering, 2018.

7. M. Liu, L. Miao, and D. Zhang, “Two-stage cost-sensitive learning for software defect prediction,” IEEE Transactions on Reliability, Vol. 63, No. 2, 2014, pp. 676–686.

8. M. Singh Rawat and S. Kumar Dubey, “Software defect prediction models for quality improvement: A literature study,” IJCSI International Journal of Computer Science Issues, Vol. 9, No. 5, 2012. [Online]. www.IJCSI.org

9. E. Erturk and E.A. Sezer, “A comparison of some soft computing methods for software fault prediction,” Expert Systems with Applications, Vol. 42, No. 4, 2015, pp. 1872–1879.

10. B. Turhan, T. Menzies, A.B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering, Vol. 14, No. 5, 2009, pp. 540–578. [Online]. https://link.springer.com/article/10.1007/s10664-008-9103-7

11. I. Arora and A. Saha, “Software defect prediction: A comparison between artificial neural network and support vector machine,” in Advances in Intelligent Systems and Computing, Vol. 562. Springer Verlag, 2018, pp. 51–61. [Online]. https://link.springer.com/chapter/10.1007/978-981-10-4603-2_6

12. T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang et al., “Defect prediction from static code features: Current results, limitations, new approaches,” Automated Software Engineering, Vol. 17, No. 4, 2010, pp. 375–407. [Online]. https://link.springer.com/article/10.1007/s10515-010-0069-5

13. D. Bowes, T. Hall, and J. Petrić, “Software defect prediction: Do different classifiers find the same defects?” Software Quality Journal, Vol. 26, No. 2, 2018, pp. 525–552.

14. S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Transactions on Reliability, Vol. 62, No. 2, 2013, pp. 434–443.

15. X. Yang and W. Wen, “Ridge and lasso regression models for cross-version defect prediction,” IEEE Transactions on Reliability, Vol. 67, No. 3, 2018, pp. 885–896.

16. X. Yang, K. Tang, and X. Yao, “A learning-to-rank approach to software defect prediction,” IEEE Transactions on Reliability, Vol. 64, No. 1, 2015, pp. 234–246.

17. F. Wu, X.Y. Jing, Y. Sun, J. Sun, L. Huang et al., “Cross-project and within-project semisupervised software defect prediction: A unified approach,” IEEE Transactions on Reliability, Vol. 67, No. 2, 2018, pp. 581–597.

18. S. Wang, T. Liu, J. Nam, and L. Tan, “Deep semantic feature learning for software defect prediction,” IEEE Transactions on Software Engineering, Vol. 46, No. 12, 2020, pp. 1267–1293.

19. Q. Song, Y. Guo, and M. Shepperd, “A comprehensive investigation of the role of imbalanced learning for software defect prediction,” IEEE Transactions on Software Engineering, Vol. 45, No. 12, 2019, pp. 1253–1269.

20. M.A. Khan, N.S. Elmitwally, S. Abbas, S. Aftab, M. Ahmad et al., “Software defect prediction using artificial neural networks: A systematic literature review,” Scientific Programming, Vol. 2022, No. 1, 2022.

21. M.S. Alkhasawneh, “Software defect prediction through neural network and feature selections,” Applied Computational Intelligence and Soft Computing, Vol. 2022, No. 1, 2022, pp. 1–16.

22. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, Vol. 6, No. 2, 2002, pp. 182–197.

23. I. Maleki, A. Ghaffari, and M. Masdari, “A new approach for software cost estimation with hybrid genetic algorithm and ant colony optimization,” International Journal of Innovation and Applied Studies, Vol. 5, No. 1, 2014, pp. 72–81. [Online]. https://www.academia.edu/download/52287242/IJIAS-13-292-35.pdf

24. H. Alsghaier and M. Akour, “Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier,” Software: Practice and Experience, Vol. 50, No. 4, 2020, pp. 407–427.

25. M.M. Rosli, Noor Hasimah Ibrahim Teo, Nor Shahida M. Yusop, and N. Shahriman Mohamad, “Fault prediction model for web application using genetic algorithm,” in International conference on computer and software Modeling (IPCSIT), 2011, pp. 71–77. [Online]. https://www.researchgate.net/profile/Marshima-Rosli/publication/264890205_Fault_Prediction_Model_for_Web_Application_Using_Genetic_Algorithm/links/55d4fcc308ae43dd17de4df4/Fault-Prediction-Model-for-Web-Application-Using-Genetic-Algorithm.pdf

26. W. Afzal and R. Torkar, “On the application of genetic programming for software engineering predictive modeling: A systematic review,” Expert Systems with Applications, Vol. 38, No. 9, 2011, pp. 11 984–11 997.

27. S. Chatterjee, S. Nigam, and A. Roy, “Software fault prediction using neuro-fuzzy network and evolutionary learning approach,” Neural Computing and Applications, Vol. 28, No. 1, 2016, pp. 1221–1231. [Online]. https://link.springer.com/article/10.1007/s00521-016-2437-y

28. S. Goyal, “Handling class-imbalance with knn (neighbourhood) under-sampling for software defect prediction,” Artificial Intelligence Review, Vol. 55, No. 3, 2022, pp. 2023–2064.

29. B. Shuai, H. Li, M. Li, Q. Zhang, and C. Tang, “Software defect prediction using dynamic support vector machine,” in 9th International Conference on Computational Intelligence and Security, 2013, pp. 260–263.

30. M.P. Sasankar and G. Sakarkar, “Cross project defect prediction using deep learning techniques,” in International Conference on Artificial Intelligence and Big Data Analytics, 2022.

31. K.O. Elish and M.O. Elish, “Predicting defect-prone software modules using support vector machines,” Journal of Systems and Software, Vol. 81, No. 5, 2008, pp. 649–660.

32. M. Hammad, A. Alqaddoumi, H. Al-Obaidy, and K. Almseidein, “Predicting software faults based on k-nearest neighbors classification,” International Journal of Computing and Digital Systems, Vol. 8, No. 5, 2019, pp. 461–467.

33. X. Chen, Y. Zhao, Q. Wang, and Z. Yuan, “MULTI: Multi-objective effort-aware just-in-time software defect prediction,” Information and Software Technology, Vol. 93, 2018, pp. 1–13.

34. T.M. Khoshgoftaar, K. Gao, and N. Seliya, “Attribute selection and imbalanced data: Problems in software defect prediction,” in International Conference on Tools with Artificial Intelligence, ICTAI, Vol. 1, 2010, pp. 137–144.

35. M.A. Mabayoje, A.O. Balogun, H.A. Jibril, J.O. Atoyebi, H.A. Mojeed et al., “Parameter tuning in kNN for software defect prediction: an empirical analysis,” Jurnal Teknologi dan Sistem Komputer, Vol. 7, No. 4, 2019, pp. 121–126.

36. A. Iqbal, S. Aftab, I. Ullah, M. Salman Bashir, and M. Anwaar Saeed, “Modern education and computer science,” Modern Education and Computer Science, Vol. 9, 2019, pp. 54–64. [Online]. http://www.mecs-press.org/

37. R. Jindal, Ruchika Malhotra, and Abha Jain, “Analysis of software project reports for defect prediction using kNN,” in Proceedings of the World Congress on Engineering, Vol. 1, 2014. [Online]. http://www.iaeng.org/publication/WCE2014/WCE2014_pp180-185.pdf

38. D. Ulumi and D.S. Series, “Weighted knn using grey relational analysis for cross-project defect prediction,” Journal of Physics: Conference Series, Vol. 1230, No. 1, 2019, p. 12062. [Online]. https://iopscience.iop.org/article/10.1088/1742-6596/1230/1/012062/meta

39. R. Goyal, P. Chandra, and Y. Singh, “Suitability of kNN regression in the development of interaction based software fault prediction models,” IERI Procedia, Vol. 6, 2014, pp. 15–21.

40. M.A. Mabayoje, A.O. Balogun, S.M. Bello, J.O. Atoyebi, H.A. Mojeed et al., “Wrapper feature selection based heterogeneous classifiers for software defect prediction,” 2019.

41. D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, Vol. 21, No. 1, 2020, pp. 1–13. [Online]. https://link.springer.com/articles/10.1186/s12864-019-6413-7https://link.springer.com/article/10.1186/s12864-019-6413-7

42. D. Tomar and S. Agarwal, “Prediction of defective software modules using class imbalance learning,” Applied Computational Intelligence and Soft Computing, Vol. 2016, 2016, pp. 1–12.

43. A.D. Chakravarthy, S. Bonthu, Z. Chen, and Q. Zhu, “Predictive models with resampling: A comparative study of machine learning algorithms and their performances on handling imbalanced datasets,” 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, 2019, pp. 1492–1495.

44. P.K. Kudjo, E. Ocquaye, and W. Ametepe, “Review of genetic algorithm and application in software testing,” International Journal of Computer Applications, Vol. 160, No. 2, 2017, pp. 1–6.

45. S. Agarwal, D. Tomar, and Siddhant, “Prediction of software defects using twin support vector machine,” in Proceedings of the International Conference on Information Systems and Computer Networks, ISCON. IEEE, 2014, pp. 128–132.

46. G.D. Boetticher, “Nearest neighbor sampling for better defect prediction,” ACM SIGSOFT Software Engineering Notes, Vol. 30, No. 4, 2005, pp. 1–6.

47. L. Gong, S. Jiang, Q. Yu, and L. Jiang, “Unsupervised deep domain adaptation for heterogeneous defect prediction,” IEICE Transactions on Information and Systems, Vol. E102D, No. 3, 2019, pp. 537–549.

48. S. Hosseini, B. Turhan, and M. Mäntylä, “A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction,” Information and Software Technology, Vol. 95, 2018, pp. 296–312.

49. U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. Salman Bashir et al., “Software defect prediction using variant based ensemble learning and feature selection techniques,” Modern Education and Computer Science, Vol. 5, 2020, pp. 29–40. [Online]. http://www.mecs-press.org/

50. S. Zheng, J. Gai, H. Yu, H. Zou, and S. Gao, “Software defect prediction based on fuzzy weighted extreme learning machine with relative density information,” Scientific Programming, Vol. 2020, No. 1, 2020.

EISEJ

e-Informatica Software Engineering Journal

Indexed by:

About

Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and k-Nearest Neighbour Classifier

Authors

Abstract

Keywords

References