Novel Data Mining Techniques for Incomplete Clinical Data in Diabetes Management

Herbert F. Jelinek

Centre for Research in Complex Systems and School of Community Health, Charles Sturt University, PO Box 789, Albury, NSW 264, Australia.

Andrew Yatsko

Centre for Informatics and Applied Optimisation, Federation University, PO Box 663, University Drive, Mt Helen Vic 3350, Australia.

Andrew Stranieri

Centre for Informatics and Applied Optimisation, Federation University, PO Box 663, University Drive, Mt Helen Vic 3350, Australia

Sitalakshmi Venkatraman *

Department of Higher Education - Business (IT), Northern Melbourne Institute of TAFE, 77-91 St Georges Rd, Preston Victoria 3072, Australia.

*Author to whom correspondence should be addressed.


Abstract

An important part of health care involves upkeep and interpretation of medical databases containing patient records for clinical decision making, diagnosis and follow-up treatment. Missing clinical entries make it difficult to apply data mining algorithms for clinical decision support. This study demonstrates that higher predictive accuracy is possible using conventional data mining algorithms if missing values are dealt with appropriately. We propose a novel algorithm using a convolution of sub-problems to stage a super problem, where classes are defined by Cartesian Product of class values of the underlying problems, and Incomplete Information Dismissal and Data Completion techniques are applied for reducing features and imputing missing values. Predictive accuracies using Decision Branch, Nearest Neighborhood and Naïve Bayesian classifiers were compared to predict diabetes, cardiovascular disease and hypertension. Data is derived from Diabetes Screening Complications Research Initiative (DiScRi) conducted at a regional Australian university involving more than 2400 patient records with more than one hundred clinical risk factors (attributes). The results show substantial improvements in the accuracy achieved with each classifier for an effective diagnosis of diabetes, cardiovascular disease and hypertension as compared to those achieved without substituting missing values. The gain in improvement is 7% for diabetes, 21% for cardiovascular disease and 24% for hypertension, and our integrated novel approach has resulted in more than 90% accuracy for the diagnosis of any of the three conditions. This work advances data mining research towards achieving an integrated and holistic management of diabetes.

Keywords: Data mining, missing value imputation, diabetes management, classifiers, diagnosis accuracy.


How to Cite

Jelinek, Herbert F., Andrew Yatsko, Andrew Stranieri, and Sitalakshmi Venkatraman. 2014. “Novel Data Mining Techniques for Incomplete Clinical Data in Diabetes Management”. Current Journal of Applied Science and Technology 4 (33):4591-4606. https://doi.org/10.9734/BJAST/2014/11744.

Downloads

Download data is not yet available.