Library Document Clustering Using Machine Learning Based on the K-Means Method
Keywords:
Unsupervised learning, Library books clustering, K-means, keyword- frequency vectors, dimensional term- frequency (TF)Abstract
Automatic clustering of library materials remains a central task in digital libraries and information- retrieval systems. In this study we investigate the viability of unsupervised clustering for grouping books based solely on keyword- frequency vectors extracted from their metadata and full- text abstracts. A corpus of 100 -400 books from three distinct disciplines (History, Computer Science, and Biology) was represented by a 250- dimensional term- frequency (TF) vector built from a curated controlled vocabulary. The k- means clustering algorithm was applied. The clustering performance was measured by clustering efficiency (runtime and memory consumption). Results show that k- means attains the highest different computational efficiency, which is dependent of the number of books involved in the classification. The findings demonstrate that keyword- frequency vectors, even in a modest‐size collection, provide sufficient discriminative power for reliable unsupervised learning, and that lightweight clustering (k- means) is adequate for most library- automation scenarios.
Downloads
References
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published by Global Research Publication will be distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License (CC-BY). So anyone is allowed to copy, distribute, and transmit the article on condition that the original article and source are correctly cited. Authors retain all copyright interest or it is retained by other copyright holders, as appropriate, and agree that the manuscript remains permanently open access in Global Research Publication's site under the terms of the Creative Commons Attribution 4.0 International License (CC-BY). Global Research Publication shall have the right to use and archive the content to create a record and may reformat or paraphrase to benefit the display of the record. For commercial use, we need to know about it.
