Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

Valentina Adu; Michael Donkor Adane; Kwadwo Asante

doi:10.9734/cjast/2021/v40i2231475

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

Full Article – PDF Review History

Published: 2021-09-06

DOI: 10.9734/cjast/2021/v40i2231475

Page: 8-25

Issue: 2021 - Volume 40 [Issue 22]

Valentina Adu *

ICT Directorate, Kumasi Technical University, P.O.BOX 854, Kumasi- Ghana.

Michael Donkor Adane

Department of Information Technology Akatsi College of Education, P. O. Box PMB, Akatsi- Ghana.

Kwadwo Asante

Department of Information Technology Education, Akenten Appiah-Menkam University of Skills Training and Entrepreneurial Development, Kumasi Technical University, P.O. Box 1277, Kumasi, Ghana.

*Author to whom correspondence should be addressed.

Abstract

We examined a similarity measure between text documents clustering. Data mining is a challenging field with more research and application areas. Text document clustering, which is a subset of data mining helps groups and organizes a large quantity of unstructured text documents into a small number of meaningful clusters. An algorithm which works better by calculating the degree of closeness of documents using their document matrix was used to query the terms/words in each document. We also determined whether a given set of text documents are similar/different to the other when these terms are queried. We found that, the ability to rank and approximate documents using matrix allows the use of Singular Value Decomposition (SVD) as an enhanced text data mining algorithm. Also, applying SVD to a matrix of a high dimension results in matrix of a lower dimension, to expose the relationships in the original matrix by ordering it from the most variant to the lowest.

Keywords: Data mining, similarity, term frequency, singular value decomposition, clustering

How to Cite

Adu, Valentina, Michael Donkor Adane, and Kwadwo Asante. 2021. “Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition”. Current Journal of Applied Science and Technology 40 (22):8-25. https://doi.org/10.9734/cjast/2021/v40i2231475.

Downloads

Download data is not yet available.