2023 | |
[1] | "Story Point Estimation Using Issue Reports With Deep Attention Neural Network", In e-Informatica Software Engineering Journal, vol. 17, no. 1, pp. 230104, 2023.
DOI: , 10.37190/e-Inf230104. Download article (PDF)Get article BibTeX file |
Authors
Haithem Kassem, Khaled Mahar, Amani A. Saad
Abstract
Background: Estimating the effort required for software engineering tasks is incredibly tricky, but it is critical for project planning. Issue reports are frequently used in the agile community to describe tasks, and story points are used to estimate task effort.
Aim: This paper proposes a machine learning regression model for estimating the number of story points needed to solve a task. The system can be trained from raw input data to predict outcomes without the need for manual feature engineering.
Method: Hierarchical attention networks are used in the proposed model. It has two levels of attention mechanisms implemented at word and sentence levels. The model gradually constructs a document vector by grouping significant words into sentence vectors and then merging significant sentence vectors to create document vectors. Then, the document vectors are fed into a shallow neural network to predict the story point.
Results: The experiments show that the proposed approach outperforms the state-of-the-art technique Deep-S which uses Recurrent Highway Networks. The proposed model has improved Mean Absolute Error (MAE) by an average of 16.6% and has improved Median Absolute Error (MdAE) by an average of 53%.
Conclusion: An empirical evaluation shows that the proposed approach outperforms the previous work.
Keywords
story points, deep learning, glove, hierarchical attention networks, agile, planning poker
References
1. K. Beck, M. Beedle, A. Van Bennekum, A. Cockburn, W. Cunningham et al., Manifesto for agile software development , 2001.
2. L.C. Briand, “On the many ways software engineering can benefit from knowledge engineering,” in Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering , 2002, pp. 3–6.
3. J.W. Paulson, G. Succi, and A. Eberlein, “An empirical study of open-source and closed-source software products,” IEEE Transactions on Software Engineering , Vol. 30, No. 4, 2004, pp. 246–256.
4. M. COHN, “Agile estimating and planning Pearson education,” 2006.
5. S. Porru, A. Murgia, S. Demeyer, M. Marchesi, and R. Tonelli, “Estimating story points from issue reports,” in Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering , 2016, pp. 1–10.
6. J. Grenning, “Planning poker or how to avoid analysis paralysis while release planning,” Hawthorn Woods: Renaissance Software Consulting , Vol. 3, 2002, pp. 22–23.
7. R. Brown and S. Pehrson, Group processes: Dynamics within and between groups . John Wiley & Sons, 2019.
8. A.R. Lindesmith, A. Strauss, and N.K. Denzin, Social psychology . Sage, 1999.
9. S. Nolen-Hoeksema, B. Fredrickson, G.R. Loftus, and C. Lutz, Introduction to psychology . Cengage Learning Washington, 2014.
10. J. Aranda and S. Easterbrook, “Anchoring and adjustment in software estimation,” in Proceedings of the 10th European Software Engineering Conference Held Jointly With 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering , 2005, pp. 346–355.
11. M. Jorgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Transactions on Software Engineering , Vol. 33, No. 1, 2006, pp. 33–53.
12. K. Moharreri, A.V. Sapre, J. Ramanathan, and R. Ramnath, “Cost-effective supervised learning models for software effort estimation in agile environments,” in 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) , Vol. 2. IEEE, 2016, pp. 135–140.
13. K. Moløkken-Østvold, N.C. Haugen, and H.C. Benestad, “Using planning poker for combining expert estimates in software projects,” Journal of Systems and Software , Vol. 81, No. 12, 2008, pp. 2106–2117.
14. V. Campos, B. Jou, and X. Giro-i Nieto, “From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction,” Image and Vision Computing , Vol. 65, 2017, pp. 15–22.
15. K. Marasek et al., “Deep belief neural networks and bidirectional long-short term memory hybrid for speech recognition,” Archives of Acoustics , Vol. 40, No. 2, 2015, pp. 191–195.
16. K.S. Tai, R. Socher, and C.D. Manning, “Improved semantic representations from tree-structured long short-term memory networks,” arXiv preprint arXiv:1503.00075 , 2015.
17. J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks , Vol. 61, 2015, pp. 85–117.
18. A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems , Vol. 25, 2012.
19. K.i. Funahashi and Y. Nakamura, “Approximation of dynamical systems by continuous time recurrent neural networks,” Neural Networks , Vol. 6, No. 6, 1993, pp. 801–806.
20. Y. Chen, “Convolutional neural network for sentence classification,” Master’s thesis, University of Waterloo, 2015.
21. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola et al., “Hierarchical attention networks for document classification,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, pp. 1480–1489.
22. M. Jørgensen, “A review of studies on expert estimation of software development effort,” Journal of Systems and Software , Vol. 70, No. 1-2, 2004, pp. 37–60.
23. M. Jørgensen and T.M. Gruschke, “The impact of lessons-learned sessions on effort estimation and uncertainty assessments,” IEEE Transactions on Software Engineering , Vol. 35, No. 3, 2009, pp. 368–383.
24. B. Boehm, Software cost estimation with COCOMO II . New Jersey: Prentice-Hall, 2000.
25. P. Sentas, L. Angelis, and I. Stamelos, “Multinomial logistic regression applied on software productivity prediction,” in 9th Panhellenic Conference in Informatics , 2003, pp. 1–12.
26. P. Sentas, L. Angelis, I. Stamelos, and G. Bleris, “Software productivity and effort prediction with ordinal regression,” Information and Software Technology , Vol. 47, No. 1, 2005, pp. 17–29.
27. S. Kanmani, J. Kathiravan, S.S. Kumar, and M. Shanmugam, “Neural network based effort estimation using class points for oo systems,” in International Conference on Computing: Theory and Applications (ICCTA’07) . IEEE, 2007, pp. 261–266.
28. A. Panda, S.M. Satapathy, and S.K. Rath, “Empirical validation of neural network models for agile software effort estimation based on story points,” Procedia Computer Science , Vol. 57, 2015, pp. 772–781.
29. S. Kanmani, J. Kathiravan, S.S. Kumar, and M. Shanmugam, “Class point based effort estimation of oo systems using fuzzy subtractive clustering and artificial neural networks,” in Proceedings of the 1st India Software Engineering Conference , 2008, pp. 141–142.
30. S. Bibi, I. Stamelos, and L. Angelis, “Software cost prediction with predefined interval estimates,” in Proceedings of Software Measurement European Forum , Vol. 4, 2004, pp. 237–246.
31. M. Shepperd and C. Schofield, “Estimating software project effort using analogies,” IEEE Transactions on Software Engineering , Vol. 23, No. 11, 1997, pp. 736–743.
32. L. Angelis and I. Stamelos, “A simulation tool for efficient analogy based cost estimation,” Empirical Software Engineering , Vol. 5, No. 1, 2000, pp. 35–68.
33. F. Sarro, A. Petrozziello, and M. Harman, “Multi-objective software effort estimation,” in 38th International Conference on Software Engineering (ICSE) . IEEE, 2016, pp. 619–630.
34. M. Jorgensen and M. Shepperd, “A systematic review of software development cost estimation studies,” IEEE Transactions on Software Engineering , Vol. 33, No. 1, 2006, pp. 33–53.
35. E. Kocaguneli, T. Menzies, and J.W. Keung, “On the value of ensemble effort estimation,” IEEE Transactions on Software Engineering , Vol. 38, No. 6, 2011, pp. 1403–1416.
36. F. Collopy, “Difficulty and complexity as factors in software effort estimation,” International Journal of Forecasting , Vol. 23, No. 3, 2007, pp. 469–471.
37. E. Kocaguneli, T. Menzies, and J.W. Keung, “On the value of ensemble effort estimation,” IEEE Transactions on Software Engineering , Vol. 6, No. 38, 2012, pp. 1403–1416.
38. R. Valerdi, “Convergence of expert opinion via the wideband delphi method,” in 21st Annual International Symposium of the International Council on Systems Engineering, INCOSE , Vol. 2011, 2011.
39. S. Chulani, B. Boehm, and B. Steece, “Bayesian analysis of empirical software engineering cost models,” IEEE Transactions on Software Engineering , Vol. 25, No. 4, 1999, pp. 573–583.
40. M. Cohn, Agile estimating and planning . Pearson Education, 2005.
41. S. Porru, A. Murgia, S. Demeyer, M. Marchesi, and R. Tonelli, “Estimating story points from issue reports,” in Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering , 2016, pp. 1–10.
42. C. Commeyne, A. Abran, and R. Djouab, “Effort estimation with story points and cosmic function points-an industry case study,” Software Measurement News , Vol. 21, No. 1, 2016, pp. 25–36.
43. G. Poels, “Definition and validation of a COSMIC-FFP functional size measure for object-oriented systems,” in Proc. 7th Int. ECOOP Workshop Quantitative Approaches OO Software Eng. Darmstadt , 2003.
44. P. Abrahamsson, R. Moser, W. Pedrycz, A. Sillitti, and G. Succi, “Effort prediction in iterative software development processes–incremental versus global prediction models,” in First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007) . IEEE, 2007, pp. 344–353.
45. P. Hearty, N. Fenton, D. Marquez, and M. Neil, “Predicting project velocity in xp using a learning dynamic bayesian network model,” IEEE Transactions on Software Engineering , Vol. 35, No. 1, 2008, pp. 124–137.
46. M. Perkusich, H.O. De Almeida, and A. Perkusich, “A model to detect problems on scrum-based software development projects,” in Proceedings of the 28th Annual ACM Symposium on Applied Computing , 2013, pp. 1037–1042.
47. M. Choetkiertikul, H.K. Dam, T. Tran, T. Pham, A. Ghose et al., “A deep learning model for estimating story points,” IEEE Transactions on Software Engineering , Vol. 45, No. 7, 2018, pp. 637–656.
48. E. Giger, M. Pinzger, and H. Gall, “Predicting the fix time of bugs,” in Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering , 2010, pp. 52–56.
49. L.D. Panjer, “Predicting eclipse bug lifetimes,” in Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007) . IEEE, 2007, pp. 29–29.
50. P. Bhattacharya and I. Neamtiu, “Bug-fix time prediction models: can we do better?” in Proceedings of the 8th Working Conference on Mining Software Repositories , 2011, pp. 207–210.
51. P. Hooimeijer and W. Weimer, “Modeling bug report quality,” in Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering , 2007, pp. 34–43.
52. E.M.D.B. Fávero, D. Casanova, and A.R. Pimentel, “SE3M: A model for software effort estimation using pre-trained embedding models,” Information and Software Technology , Vol. 147, 2022, p. 106886.
53. P. Liu, Y. Liu, X. Hou, Q. Li, and Z. Zhu, “A text clustering algorithm based on find of density peaks,” in 2015 7th International Conference on Information Technology in Medicine and Education (ITME) . IEEE, 2015, pp. 348–352.
54. J. Pennington, R. Socher, and C.D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, pp. 1532–1543.
55. W. Guohua and G. Yutian, “Using density peaks sentence clustering for update summary generation,” in 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) . IEEE, 2016, pp. 1–5.
56. J. Pennington, R. Socher, and C.D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, pp. 1532–1543.
57. K. Cho, A. Courville, and Y. Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia , Vol. 17, No. 11, 2015, pp. 1875–1886.
58. E. Loper and S. Bird, “Nltk: The natural language toolkit,” arXiv preprint cs/0205028 , 2002.
59. T. Foss, E. Stensrud, B. Kitchenham, and I. Myrtveit, “A simulation study of the model evaluation criterion MMRE,” IEEE Transactions on Software Engineering , Vol. 29, No. 11, 2003, pp. 985–995.
60. B.A. Kitchenham, L.M. Pickard, S.G. MacDonell, and M.J. Shepperd, “What accuracy statistics really measure [software estimation],” IEE Proceedings-Software , Vol. 148, No. 3, 2001, pp. 81–85.
61. M. Korte and D. Port, “Confidence in software cost estimation results based on MMRE and PRED,” in Proceedings of the 4th International Workshop on Predictor Models in Software Engineering , 2008, pp. 63–70.
62. D. Port and M. Korte, “Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research,” in Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement , 2008, pp. 51–60.
63. F. Sarro, A. Petrozziello, and M. Harman, “Multi-objective software effort estimation,” in 38th International Conference on Software Engineering (ICSE) . IEEE, 2016, pp. 619–630.
64. T. Menzies, E. Kocaguneli, B. Turhan, L. Minku, and F. Peters, Sharing data and models in software engineering . Morgan Kaufmann, 2014.
65. K. Muller, “Statistical power analysis for the behavioral sciences,” 1989.
66. J. Cohen, Statistical power analysis for the behavioral sciences . Routledge, 2013.
67. J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805 , 2018.
68. J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” arXiv preprint arXiv:1801.06146 , 2018.
69. A.M. Dai and Q.V. Le, “Semi-supervised sequence learning,” Advances in Neural Information Processing Systems , Vol. 28, 2015.
70. X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian et al., “Gpt understands, too,” arXiv preprint arXiv:2103.10385 , 2021.
71. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R.R. Salakhutdinov et al., “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in Neural Information Processing Systems , Vol. 32, 2019.