Vol. 25 Issue 4 2022


Dipalika Das 1*,   Maya Nayak2 , Subhendu Kumar Pani3

1 Dipalika Das, Research Scholar, Department of Computer Science and Engineering, Biju Patnaik University of Technology, Rourkela, Odisha, India;,

2 Dr. Maya Nayak, Dean School of Computer Studies, Ajay Binay Institute of Technology(ABIT), Cuttack, Biju Patnaik University of Technology(BPUT) Rourkela, Odisha, India;,

3 Dr. Subhendu Kumar Pani, Professor, Krupajal Engineering College(KEC), Bhubaneswar, Biju Patnaik University of Technology  (BPUT),Rourkela, Odisha, India;,

Identification of missing values from time-series data samples is a complex signal processing task, that involves pattern analysis, pre-emptive modelling, and regression techniques. A wide variety of models are proposed by researchers to optimize efficiency of missing value identification techniques, but most of them are highly complex, and cannot be used for large-scale information sets. Moreover, the simpler models that are applied to large-scale sets have low efficiency levels, which limits their applicability for real-time applications. To overcome these issues, this text proposes design of a novel Elephant Herding Optimization (EHO) Model for tuning an efficient missing value identification ensemble classifier, which can be used for feature-based data samples. The proposed model uses a combination of Deep Forest (DF), Support Vector Machines (SVM), Naïve Bayes (NB), and k Nearest Neighbour (kNN) classifiers for correlative analysis of missing value samples. The efficiency of proposed classifier is optimized via EHO model, which assists in identification of classifier hyper parameters in order to improve performance of missing value identification process. The EHO model uses an efficient fitness function that combines accuracy, precision, and recall levels obtained when evaluating effectiveness of the missing value identification process. To evaluate its performance, the model was used for multiple large-scale datasets, and an accuracy improvement of 9.5%, with a precision improvement of 8.3%, and recall improvement of 4.5% was observed, when compared with standard regression-based pre-emption models. Due to this, the proposed method was observed to be highly scalable, and can be applied to multidomain use cases.