e-Informatica Software Engineering Journal Applying Machine Learning to Software Fault Prediction

Applying Machine Learning to Software Fault Prediction

2018
[1]Bartłomiej Wójcicki and Robert Dąbrowski, "Applying Machine Learning to Software Fault Prediction", In e-Informatica Software Engineering Journal, vol. 12, no. 1, pp. 199–216, 2018. DOI: 10.5277/e-Inf180108.

Download article (PDF)Get article BibTeX file

Authors

Bartłomiej Wójcicki, Robert Dąbrowski

Abstract

Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.

Keywords

classifier, fault prediction, machine learning, metric, Naïve Bayes, Python, quality, software intelligence

References

[1]   T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering, Vol. 33, No. 1, 2007, pp. 2–13.

[2]   E.W. Dijkstra, “Letters to the editor: go to statement considered harmful,” Communications of the ACM, Vol. 11, No. 3, 1968, pp. 147–148.

[3]   J. McCarthy, P.W. Abrams, D.J. Edwards, T.P. Hart, and M.I. Levin, Lisp 1.5 programmer’s manual. The MIT Press, 1962.

[4]   W. Royce, “Managing the development of large software systems: Concepts and techniques,” in Technical Papers of Western Electronic Show and Convention (WesCon), 1970, pp. 328–338.

[5]   K. Beck, “Embracing change with extreme programming,” IEEE Computer, Vol. 32, No. 10, 1999, pp. 70–77.

[6]   R. Kaufmann and D. Janzen, “Implications of test-driven development: A pilot study,” in Companion of the 18th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, ser. OOPSLA ’03. New York, NY, USA: ACM, 2003, pp. 298–299.

[7]   R. Dąbrowski, “On architecture warehouses and software intelligence,” in Future Generation Information Technology, ser. Lecture Notes in Computer Science, T.H. Kim, Y.H. Lee, and W.C. Fang, Eds., Vol. 7709. Springer, 2012, pp. 251–262.

[8]   R. Dąbrowski, K. Stencel, and G. Timoszuk, “Software is a directed multigraph,” in 5th European Conference on Software Architecture ECSA, ser. Lecture Notes in Computer Science, I. Crnkovic, V. Gruhn, and M. Book, Eds., Vol. 6903. Essen, Germany: Springer, 2011, pp. 360–369.

[9]   R. Dąbrowski, G. Timoszuk, and K. Stencel, “One graph to rule them all (software measurement and management),” Fundamenta Informaticae, Vol. 128, No. 1-2, 2013, pp. 47–63.

[10]   G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, Tech. Rep., 2002. [Online]. https://pdfs.semanticscholar.org/9b68/5f84da00514397d9af7f27cc0b7db7df05c3.pdf

[11]   T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A.B. Bener, “Defect prediction from static code features: Current results, limitations, new approaches,” Automated Software Engineering, Vol. 17, No. 4, 2010, pp. 375–407.

[12]   S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Transactions on Software Engineering, Vol. 34, No. 4, 2008, pp. 485–496.

[13]   T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering, Vol. 38, No. 6, 2012, pp. 1276–1304.

[14]   B.W. Boehm, “Software risk management,” in ESEC ’89, 2nd European Software Engineering Conference, ser. Lecture Notes in Computer Science, C. Ghezzi and J.A. McDermid, Eds., Vol. 387. Springer, 1989, pp. 1–19.

[15]   T.M. Khoshgoftaar and N. Seliya, “Fault prediction modeling for software quality estimation: Comparing commonly used techniques,” Empirical Software Engineering, Vol. 8, No. 3, 2003, pp. 255–283.

[16]   A.A. Porter and R.W. Selby, “Empirically guided software development using metric-based classification trees,” IEEE Software, Vol. 7, No. 2, 1990, pp. 46–54.

[17]   T. Menzies, J. DiStefano, A. Orrego, and R.M. Chapman, “Assessing predictors of software defects,” in Proceedings of Workshop Predictive Software Models, 2004.

[18]   T. Menzies, R. Krishna, and D. Pryor, The Promise Repository of Empirical Software Engineering Data, North Carolina State University, Department of Computer Science, (2015). [Online]. http://openscience.us/repo

[19]   T. Menzies, J.S.D. Stefano, K. Ammar, K. McGill, P. Callis, R.M. Chapman, and J. Davis, “When can we test less?” in 9th IEEE International Software Metrics Symposium (METRICS), Sydney, Australia, 2003, p. 98.

[20]   T. Menzies, J.S.D. Stefano, and M. Chapman, “Learning early lifecycle IV&V quality indicators,” in 9th IEEE International Software Metrics Symposium (METRICS), Sydney, Australia, 2003, pp. 88–97.

[21]   T. Menzies and J.S.D. Stefano, “How good is your blind spot sampling policy?” in 8th IEEE International Symposium on High-Assurance Systems Engineering (HASE). Tampa, FL, USA: IEEE Computer Society, 2004, pp. 129–138.

[22]   M. Lanza, A. Mocci, and L. Ponzanelli, “The tragedy of defect prediction, prince of empirical software engineering research,” IEEE Software, Vol. 33, No. 6, 2016, pp. 102–105.

[23]   M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning, ser. Adaptive computation and machine learning. The MIT Press, 2012.

[24]   R.C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall, 2008.

[25]   T.J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, Vol. 2, No. 4, 1976, pp. 308–320.

[26]   M.H. Halstead, Elements of Software Science (Operating and Programming Systems Series). New York, NY, USA: Elsevier Science Ltd, 1977.

[27]   N.E. Fenton and S.L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, 2nd ed. Boston, MA, USA: Course Technology, 1998.

[28]   C. Ebert and J. Cain, “Cyclomatic complexity,” IEEE Software, Vol. 33, No. 6, 2016, pp. 27–29.

[29]   F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou, “Towards building a universal defect prediction model,” in 11th Working Conference on Mining Software Repositories, MSR, P.T. Devanbu, S. Kim, and M. Pinzger, Eds. Hyderabad, India: ACM, 2014, pp. 182–191.

[30]   S. Kim, “Adaptive bug prediction by analyzing project history,” Ph.D. dissertation, University of California at Santa Cruz, Santa Cruz, CA, USA, 2006, aAI3229992.

[31]   S. Kim, T. Zimmermann, K. Pan, and E.J. Whitehead, Jr., “Automatic identification of bug-introducing changes,” in 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). Tokyo, Japan: IEEE Computer Society, 2006, pp. 81–90.

[32]   T. Gyimóthy, R. Ferenc, and I. Siket, “Empirical validation of object-oriented metrics on open source software for fault prediction,” IEEE Transactions on Software Engineering, Vol. 31, No. 10, 2005, pp. 897–910.

[33]   H. Zhang and X. Zhang, “Comments on ‘Data mining static code attributes to learn defect predictors’,” IEEE Transactions on Software Engineering, Vol. 33, No. 9, 2007, pp. 635–637.

[34]   T. Menzies, A. Dekhtyar, J.S.D. Stefano, and J. Greenwald, “Problems with precision: A response to ‘Comments on data mining static code attributes to learn defect predictors’,” IEEE Transactions on Software Engineering, Vol. 33, No. 9, 2007, pp. 637–640.

[35]   M.A. Hall and G. Holmes, “Benchmarking attribute selection techniques for discrete class data mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 6, 2003, pp. 1437–1447.

[36]   F. Shull, V.R. Basili, B.W. Boehm, A.W. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M.V. Zelkowitz, “What we have learned about fighting defects,” in 8th IEEE International Software Metrics Symposium (METRICS). Ottawa, Canada: IEEE Computer Society, 2002, p. 249.

[37]   S. Kim, T. Zimmermann, E.J. Whitehead, Jr., and A. Zeller, “Predicting faults from cached history,” in 29th International Conference on Software Engineering (ICSE 2007). Minneapolis, MN, USA: IEEE, 2007, pp. 489–498.

[38]   F. Rahman, D. Posnett, A. Hindle, E.T. Barr, and P.T. Devanbu, “BugCache for inspections: hit or miss?” in SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13rd European Software Engineering Conference (ESEC-13), T. Gyimóthy and A. Zeller, Eds. Szeged, Hungary: ACM, 2011, pp. 322–331.

[39]   C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E.J.W. Jr., “Does bug prediction support human developers? Findings from a Google case study,” in 35th International Conference on Software Engineering, ICSE, D. Notkin, B.H.C. Cheng, and K. Pohl, Eds. San Francisco, CA, USA: IEEE / ACM, 2013, pp. 372–381.

[40]   N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” in 27th International Conference on Software Engineering (ICSE), G. Roman, W.G. Griswold, and B. Nuseibeh, Eds. ACM, 2005, pp. 284–292.

[41]   A.E. Hassan, “Predicting faults using the complexity of code changes,” in 31st International Conference on Software Engineering, ICSE. Vancouver, Canada: IEEE, 2009, pp. 78–88.

[42]   E. Giger, M. Pinzger, and H.C. Gall, “Comparing fine-grained source code changes and code churn for bug prediction,” in Proceedings of the 8th International Working Conference on Mining Software Repositories, MSR, A. van Deursen, T. Xie, and T. Zimmermann, Eds. ACM, 2011, pp. 83–92.

[43]   S.R. Chidamber and C.F. Kemerer, “Towards a metrics suite for object oriented design,” in Sixth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’91), A. Paepcke, Ed. Phoenix, Arizona, USA: ACM, 1991, pp. 197–211.

[44]   R. Shatnawi, “Empirical study of fault prediction for open-source systems using the Chidamber and Kemerer metrics,” IET Software, Vol. 8, No. 3, 2014, pp. 113–119.

[45]   N.V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: Special issue on learning from imbalanced data sets,” ACM SIGKDD Explorations Newsletter, Vol. 6, No. 1, Jun. 2004, pp. 1–6.

[46]   R. Wu, H. Zhang, S. Kim, and S. Cheung, “ReLink: recovering links between bugs and changes,” in SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13rd European Software Engineering Conference (ESEC-13), T. Gyimóthy and A. Zeller, Eds. Szeged, Hungary: ACM, 2011, pp. 15–25.

[47]   T. Gyimóthy and A. Zeller, Eds., SIGSOFT/FSE 11 Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Software Engineering. Szeged, Hungary: ACM, 2011.

© 2015-2024 by e-Informatyka.pl, All rights reserved.

Built on WordPress Theme: Mediaphase Lite by ThemeFurnace.