2021 | |
[1] | "Multi-view learning for software defect prediction", In e-Informatica Software Engineering Journal, vol. 15, no. 1, pp. 163–184, 2021.
DOI: , 10.37190/e-Inf210108. Download article (PDF)Get article BibTeX file |
Authors
Elife Ozturk Kiyak, Derya Birant, Kokten Ulas Birant
Abstract
Background: Traditionally, machine learning algorithms have been simply applied for software defect prediction by considering single-view data, meaning the input data contains a single feature vector. Nevertheless, different software engineering data sources may include multiple and partially independent information, which makes the standard single-view approaches ineffective.
Objective: In order to overcome the single-view limitation in the current studies, this article proposes the usage of a multi-view learning method for software defect classification problems.
Method: The Multi-View k-Nearest Neighbors (MVKNN) method was used in the software engineering field. In this method, first, base classifiers are constructed to learn from each view, and then classifiers are combined to create a robust multi-view model.
Results: In the experimental studies, our algorithm (MVKNN) is compared with the standard k-nearest neighbors (KNN) algorithm on 50 datasets obtained from different software bug repositories. The experimental results demonstrate that the MVKNN method outperformed KNN on most of the datasets in terms of accuracy. The average accuracy values of MVKNN are 86.59%, 88.09%, and 83.10% for the NASA MDP, Softlab, and OSSP datasets, respectively.
Conclusion: The results show that using multiple views (MVKNN) can usually improve classification accuracy compared to a single-view strategy (KNN) for software defect prediction.
Keywords
Software defect prediction, multi-view learning, machine learning, k-nearest neighbors
References
1. R. Ozakinci and A. Tarhan, “Early software defect prediction: A systematic map and review,” The Journal of Systems and Software , Vol. 144, Oct. 2018, pp. 216–239.
2. K. Bashir, T. Li, and M. Yahaya, “A novel feature selection method based on maximum likelihood logistic regression for imbalanced learning in software defect prediction,” The International Arab Journal of Information Technology , Vol. 17, No. 5, Sep. 2020, pp. 721–730.
3. J. Zhao, X. Xie, X. Xu, and S. Sun, “Multi-view learning overview: Recent progress and new challenges,” Information Fusion , Vol. 38, No. 1, Nov. 2017, pp. 43–54.
4. F. Liu, T. Zhang, C. Zheng, Y. Cheng, X. Liu, M. Qi, J. Kong, and J. Wang, “An intelligent multi-view active learning method based on a double-branch network,” Entropy , Vol. 22, No. 8, Aug. 2020.
5. Y. Chen, D. Li, X. Zhang, J. Jin, and Y. Shen, “Computer aided diagnosis of thyroid nodules based on the devised small-datasets multi-view ensemble learning,” Medical Image Analysis , Vol. 67, No. 8, Jan. 2021.
6. Y. Song, Y. Wang, X. Ye, D. Wang, Y. Yin, and Y. Wang, “Multi-view ensemble learning based on distance-to-model and adaptive clustering for imbalanced credit risk assessment in p2p lending,” Information Sciences , Vol. 525, Jul. 2020, pp. 182–204.
7. S. Cheng, F. Lu, P. Peng, and S. Wu, “Multi-task and multi-view learning based on particle swarm optimization for short-term traffic forecasting,” Knowledge-Based Systems , Vol. 180, Sep. 2019, pp. 116–132.
8. Y. He, Y. Tian, and D. Liu, “Multi-view transfer learning with privileged learning framework,” Neurocomputing , Vol. 335, Mar. 2019, pp. 131–142.
9. J. Li, L. Wu, G. Wen, and Z. Li, “Exclusive feature selection and multi-view learning for alzheimer’s disease,” Journal of Visual Communication and Image Representation , Vol. 64, Oct. 2019.
10. I.H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology , Vol. 58, Feb. 2015, pp. 388–402.
11. S. Agarwal and D. Tomar, “A feature selection based model for software defect prediction,” International Journal of Advanced Science and Technology , Vol. 65, 2014, pp. 39–58.
12. H. Wang, T.M. Khoshgoftaar, and N. Seliya, “How many software metrics should be selected for defect prediction?” in Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference , R.C. Murray and P.M. McCarthy, Eds. Palm Beach, Florida, USA: AAAI Press, May 2011.
13. W. Wen, B. Zhang, X. Gu, and X. Ju, “An empirical study on combining source selection and transfer learning for cross-project defect prediction,” in 2019 IEEE 1st International Workshop on Intelligent Bug Fixing (IBF) . Hangzhou, China: IEEE, 2019, pp. 29–38.
14. A. Iqbal, S. Aftab, I. Ullah, M.S. Bashir, and M.A. Saeed, “A feature selection based ensemble classification framework for software defect prediction,” International Journal of Modern Education and Computer Science , Vol. 11, No. 9, 2019, pp. 54–64.
15. A. Arshad, S. Riaz, L. Jiao, and A. Murthy, “The empirical study of semi-supervised deep fuzzy c-mean clustering for software fault prediction,” IEEE Access , Vol. 6, 2018, pp. 47047–47061.
16. M.M. Mirończuk, J. Protasiewicz, and W. Pedrycz, “Empirical evaluation of feature projection algorithms for multi-view text classification,” Expert Systems with Applications , Vol. 130, 2019, pp. 97–112.
17. C. Zhang, J. Cheng, and Q. Tian, “Multi-view image classification with visual, semantic and view consistency,” IEEE Transactions on Image Processing , Vol. 29, 2020, pp. 617–627.
18. Z. Zhu, P. Luo, X. Wang, and X. Tang, “Multi-view perceptron: a deep model for learning face identity and view representations,” in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 , Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, Eds. Montreal, Quebec, Canada: Citeseer, Dec. 2014, pp. 217–225.
19. S.R. Shahamiri and S.S.B. Salim, “A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks,” IEEE Transactions on Neural Systems and Rehabilitation Engineering , Vol. 22, No. 5, Sep. 2014, pp. 1053–1063.
20. A. Saeidi, J. Hage, R. Khadka, and S. Jansen, “Applications of multi-view learning approaches for software comprehension,” The Art, Science, and Engineering of Programming , Vol. 3, No. 3, 2019.
21. E.O. Kiyak, A.B. Cengiz, K.U. Birant, and D. Birant, “Comparison of image-based and text-based source code classification using deep learning,” SN Computer Science , Vol. 1, No. 5, 2020, pp. 1–13.
22. A.V. Phan and M.L. Nguyen, “Convolutional neural networks on assembly code for predicting software defects,” in 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES) . Hanoi, Vietnam: IEEE, Nov. 2017, pp. 37–42.
23. J. Chen, Y. Yang, K. Hu, Q. Xuan, Y. Liu, and C. Yang, “Multiview transfer learning for software defect prediction,” IEEE Access , Vol. 7, Jan. 2019, pp. 8901–8916.
24. D. Ulumi and D. Siahaan, “Weighted k-NN using grey relational analysis for cross-project defect prediction,” Journal of Physics: Conference Series , Vol. 1230, Jul. 2019, p. 012062.
25. R. Sathyaraj and S. Prabu, “A hybrid approach to improve the quality of software fault prediction using naïve bayes and k-nn classification algorithm with ensemble method,” International Journal of Intelligent Systems Technologies and Applications , Vol. 17, No. 4, Oct. 2018, pp. 483–496.
26. L. He, Q.B. Song, and J.Y. SHEN, “Boosting-based k-NN learning for software defect prediction,” Pattern Recognition and Artificial Intelligence , Vol. 25, No. 5, 2012, pp. 792–802.
27. R. Goyal, P. Chandra, and Y. Singh, “Suitability of k-NN regression in the development of interaction based software fault prediction models,” IERI Procedia , Vol. 6, No. 1, 2014, pp. 15–21.
28. S.K. Srivastava and S.K. Singh, “Multi-label classification of twitter data using modified ML- kNN,” in Advances in Data and Information Sciences , Lecture Notes in Networks and Systems, K. M., T. M., T. S., and S. V., Eds., Vol. 39. Singapore: Springer, Jun. 2019, pp. 31–41.
29. P. Villar, R. Montes, A.M. Sánchez, and F. Herrera, “Fuzzy-citation- k-NN: A fuzzy nearest neighbor approach for multi-instance classification,” in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) . Vancouver, BC, Canada: IEEE, Jul. 2016, pp. 946–952.
30. Y. Xia, Y. Peng, X. Zhang, and H.Y. Bae, “DEMST-KNN: A novel classification framework to solve imbalanced multi-class problem,” in Artificial Intelligence Trends in Intelligent Systems , Advances in Intelligent Systems and Computing, R. Silhavy, R. Senkerik, Z.K. Oplatková, Z. Prokopova, and P. Silhavy, Eds., Vol. 573. Cham, Germany: Springer, Apr. 2017, pp. 291–301.
31. S. Gupta, S. Rana, B. Saha, D. Phung, and S. Venkatesh, “A new transfer learning framework with application to model-agnostic multi-task learning,” Knowledge and Information Systems , Vol. 49, No. 3, Feb. 2016, pp. 933–973.
32. E.O. Kiyak, D. Birant, and K.U. Birant, “An improved version of multi-view k-nearest neighbors (MVKNN) for multiple view learning,” Turkish Journal of Electrical Engineering and Computer Sciences , Vol. 29, No. 3, 2021, pp. 1401–1428.
33. S. Li, E.J. Harner, and D.A. Adjeroh, “Random KNN feature selection – A fast and stable alternative to random forests,” BMC bioinformatics , Vol. 12, No. 1, 2011, pp. 1–11.
34. I.H. Witten, E. Frank, M.A. Hall, and C.J. Pal, Data Mining: Practical Machine Learning Tools and Techniques , 4th ed., The Morgan Kaufmann Series in Data Management Systems. Cambridge, MA, USA: Elsevier Science, 2016.
35. “Tera-promise data,” accessed: 10.05.2020. [Online]. https://github.com/klainfo/DefectData/tree/master/inst/extdata/terapromise
36. “NASA MDP data,” accessed: 07.05.2020. [Online]. https://github.com/klainfo/NASADefectDataset/tree/master/OriginalData/MDP
37. B. Turhan, T. Menzies, A.B. Bener, and J.D. Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering , Vol. 14, No. 5, Jan. 2009, pp. 540–578.
38. E. Borandag, A. Ozcift, D. Kilinc, and F. Yucalar, “Majority vote feature selection algorithm in software fault prediction,” Computer Science and Information Systems , Vol. 16, No. 2, 2019, pp. 515–539.
39. Z. Yao, J. Song, Y. Liu, T. Zhang, and J. Wang, “Research on cross-version software defect prediction based on evolutionary information,” IOP Conference Series: Materials Science and Engineering , Vol. 563, Aug. 2019, p. 052092.
40. T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE transactions on software engineering , Vol. 33, No. 1, Dec. 2006, pp. 2–13.
41. R.F. Woolson, Wilcoxon Signed-Rank Test . Wiley Encyclopedia of Clinical Trials, 2008, pp. 1–3.
42. H. Liu, S. Zhang, J. Zhao, X. Zhao, and Y. Mo, “A new classification algorithm using mutual nearest neighbors,” in 2010 Ninth International Conference on Grid and Cloud Computing . Nanjing, China: IEEE, Nov. 2010, pp. 52–57.
43. U. Lall and A. Sharma, “A nearest neighbor bootstrap for resampling hydrologic time series,” Water Resources Research , Vol. 32, No. 3, Mar. 1996, pp. 679–693.
44. J. Park and D.H. Lee, “Parallelly running k-nearest neighbor classification over semantically secure encrypted data in outsourced environments,” IEEE Access , Vol. 8, 2020, pp. 64617–64633.
45. P. Mitra, C. Murthy, and S. Pal, “Unsupervised feature selection using feature similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 24, No. 3, 2002, pp. 301–312.
46. P. Runeson and M. Höst, “Guidelines for conducting and reporting case study research in software engineering,” Empirical Software Engineering , Vol. 14, No. 2, 2009, pp. 131–164.