Document Type: Original Research Paper


1 Department of Computer Engineering, Islamic Azad University, Kerman Branch. Kerman, Iran.

2 Department of Computer Engineering, Islamic Azad University, Kerman Branch

3 Department of Computer Engineering, Islamic Azad University, Kerman Branch, Krman, Iran.

4 Department of Computer Engineering, Islamic Azad University, Kerman Branch, Kerman Iran.


One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patterns have been presented in recent years, which are focused on intelligent techniques. This study made use of clustering approach for estimating required effort in software projects. The effort estimation is carried out through SWR (StepWise Regression) and MLR (Multiple Linear Regressions) regression models as well as CART (Classification And Regression Tree) method. The performance of these methods is experimentally evaluated using real software projects. Moreover, clustering of projects is applied to the estimation process. As indicated by the results of this study, the combination of clustering method and algorithmic estimation techniques can improve the accuracy of estimates.


Main Subjects

[1] Angelis, L., Stamelos, I., Morisio, M., 2001, Building a software cost estimation model based on categorical data. Proceedings of the International Software Metrics Symposium, pp.4 – 15.
[2] Sommerville, I., 2006, Software Engineering: (Update) (8th Edition) (International Computer Science), Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
[3] Yang, F., Sun, T. and Zhang, Ch., 2009, “An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization”, Expert Systems with Applications, pp. 9847–9852.
[4] Hamerly, G. and Elkan, Ch., 2002,"Alternatives to the k-means algorithm that find better clusterings", Proceedings of the international conference on Information and knowledge management, pp.600-607.
[5] Liu, Q., Chu, X., Xiao, J. and Zhu, H., 2014, "Optimizing Non-orthogonal Space Distance Using PSO in Software Cost Estimation", Proceedings of the Annual Computer Software and Applications Conference, pp. 21 – 26.
[6] Rawlings, J. O., Pantula, S. G., Dickey, D. A., 1998, " Applied Regression Analysis: A Research Tool", Springer-Verlag New York.
[7] Han, J., Kamber, M. and Pei, J., 2011, " Data Mining: Concepts and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)", Elsevier Inc.
[8] Hand, D. J., Smyth, P. and Mannila, H., 2001, "Principles of data mining",MIT Press Cambridge, MA, USA.
[9] Raftery, A. E., Madigan, D., Hoeting, J. A., 1997, "Bayesian Model Averaging for Linear Regression Models", Journal of the American Statistical Association, pp. 179-191.
[10] Xu, R. F. and Lee, Sh. J., 2015, "Dimensionality reduction by feature clustering for regression problems", Information Sciences, pp. 42–57.
[11] Satapathy, S. Ch., Murthy, J. V. R., Prasad Reddy, P.V.G.D., Misra, B.B., Dash, P.K., and Panda, G., 2009, " Particle swarm optimized multiple regression linear model for data classification", Applied Soft Computing, pp. 470–476.
[12] Khashei, M., Hamadani, A. Z. and Bijari, M., 2012, "A novel hybrid classification model of artificial neural networks and multiple linear regression models", Expert Systems with Applications, pp. 2606–2620.
[13] Aroba, J., Cuadrado-Gallego, J., JSicilia, M. Á., Ramos, I. and García-Barriocanal, E., 2008, "Segmented software cost estimation models based on fuzzy clustering", Journal of Systems and Software, pp. 1944–1950.
[14] Yoon, K. A., Kwon, Oh. S. and Bae, D.H., 2007, "An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method", Proceedings of the International Symposium on Empirical Software Engineering and Measurement, ,pp. 443 – 445.
[15] Bishnu, P.S. and Bhattacherjee, V., 2012, "Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm", IEEE Transactions Knowledge and Data Engineering, pp. 1146 – 1150.
[16] Rajput, P. K., Sikka, G. and Aarti, 2014, "CGANN-Clustered Genetic Algorithm with Neural Network for Software Cost Estimation", Proceedings of the International Conference on Advances in Engineering and Technology (ICAET'), pp. 268- 272.
[17] Malviya, A. K. and Yadav, V. K., 2012, "Maintenance activities in object oriented software systems using K-means clustering technique: A review", Proceedings of the International Conference on Software Engineering (CONSEG), pp. 1 – 5.
[18] Mahmuddin, M. and Yusof, Y., 2010, "Automatic estimation total number of cluster using a hybrid test-and-generate and K-means algorithm", Proceedings of the International Conference on Computer Applications and Industrial Electronics (ICCAIE), pp. 593 – 596.
[19] Velmurugan, T., 2014, "Performance based analysis between k-Means and Fuzzy C-Means clustering algorithms for connection oriented telecommunication data", Applied Soft Computing, pp. 134–146.
[20] Fisher, R. A., 1936, "The Use of Multiple Measurements in Taxonomic Problems", Annals of Eugenics, pp.179-188.
[21] Papatheocharous, E., Papadopoulos, H. and Andreou, AS., 2010, "Feature subset selection for software cost modeling and estimation", Software Engineering, pp. 1-22.
[22] Kotsiantis, S. B., 2013, " Decision trees: a recent overview",Artificial Intelligence Review, pp. 261-283.
[23] Hodge, V. and Austin, J., 2004, "A Survey of Outlier Detection Methodologies", Artificial Intelligence Review, pp.85 – 126.
[24] Hartigan, J. A. and Wong, M. A., 1979, "Algorithm AS 136: A K-Means Clustering Algorithm". Journal of the Royal Statistical Society, pp. 100–108.
[25] Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R. and Wu, A. Y., 2002, "An efficient k-means clustering algorithm: Analysis and implementation" IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 881–892.
[26] Kaushik, A., 2012, “COCOMO Estimates Using Neural Networks,” International Journal of Intelligent Systems and Applications (IJISA), pp. 22–28.
[27] Hamza, H., Kamel, A. and Shams, K., 2013, “Software Effort Estimation using Artificial Neural Networks: A Survey of the Current Practices,” Proceedings of the International Conference on Information Technology: New Generations (ITNG), pp. 731 - 733.
[28] Sehra, S. K., Brar, Y. S. and Kaur, N., 2011, “SOFT COMPUTING TECHNIQUES FOR SOFTWARE PROJECT EFFORT ESTIMATION,” international Journal of Advanced Computer and Mathematical Sciences, pp. 160–167.
[29] Elish, M. O., Helmy, T. and Hussain, M. I., 2013, “Empirical Study of Homogeneous and Heterogeneous Ensemble Models for Software Development Effort Estimation,” Hindawi Publishing Corporation Mathematical Problems in Engineering, pp.1-21.
[30] Sucasas, V., Radwana, A., Marques, .H, Rodriguez, J., Vahid, S., Tafazolli, R., “A survey on clustering techniques for cooperative wireless networks Victor “,Ad Hoc Networks 47 (2016) 53–81.