A Design of an Automatic Web Page Classification System

Tarek M. Mahmoud *

Department of Computer Science, Faculty of Science, Minia University, El-Minia, Egypt.

Doha Taha Nour El-Deen

MISR University for Science and Technology, 6th of October City, Egypt.

Tarek Abd- El-Hafeez

Department of Computer Science, Faculty of Science, Minia University, El-Minia, Egypt.

*Author to whom correspondence should be addressed.


Abstract

Web Page Classification is one of the common problems of the today's Internet. In this paper, an automatic Web page classification system is introduced. The proposed system tries to increase the accuracy of a web page classification via combine the well-known Naïve Bayesian algorithm, Support Vector Machine and K-Nearest Neighbor. The experimental results shows that the performance of classifying web page by hybrid Naïve Bayesian classifier, Support Vector Machine and K-Nearest Neighbor algorithm is better than using Naïve Bayesian alone as always used to get the highest and fastest classifier or using K-Nearest Neighbor alone or using Support Vector Machine alone to reduce the false positive rate and get highest accuracy. The experimental results, applied on 10.000 web pages (30% for training process and 70% for testing process), showed a high efficiency with the less number of false positive rate (on average) 0%, the true positive rate (on average) 1%, F-measure (on average) 1% and overall accuracy rate (on average) 99.98%.

Keywords: Web page classification, naïve bayesian algorithm, support vector machine, K-nearest neighbor


How to Cite

Mahmoud, Tarek M., Doha Taha Nour El-Deen, and Tarek Abd- El-Hafeez. 2017. “A Design of an Automatic Web Page Classification System”. Current Journal of Applied Science and Technology 18 (6):1-14. https://doi.org/10.9734/BJAST/2016/30376.

Downloads

Download data is not yet available.