Application of the K-Nearest Neighbor (K-NN) Machine Learning Algorithm for the Identification of Colorectal Cancer Based on microRNAs

Application of the K-Nearest Neighbor (K-NN) Machine Learning Algorithm for the Identification of Colorectal Cancer Based on microRNAs

21 Aug 2021 12:51 12:54
(3 mins)
Rifaldy Fajar Presenter
Loading Vimeo...

Rifaldy Fajar1, Nana Indri Kurniastuti1, Prihantini Jupri1 and Titik Wulandari1

1Computational Biology and Medicine Laboratory, Yogyakarta State University, Indonesia

Background/Aims: Colorectal cancer is a malignancy originating from the colon tissue, consisting of the colon and/or rectum. One of the screening methods for colorectal cancer is microRNA. Micro-RNA is a short nucleotide (about 18-25 nucleotide bases) that plays a role in various intracellular processes by regulating gene expression. The micro-RNAs that affect colorectal cancer are miR-21, miR-31, miR135b, miR-183, miR-222, miR-145, and miR-195. In this study, the K-Nearest Neighbor method was used to classify micro-RNA data.

Methods: The dataset used is micro-RNA data for cancer detection. Micro-RNA data were obtained from the National Cancer Institute Genomic Data Commons. The dataset used is 600 data, consisting of 300 normal data and 300 colorectal cancer data. The dataset consisted of 7 micro-RNA expressions along with their data labels, colorectal or normal cancer. Dividing the dataset into training data and test data using the K-Fold Cross Validation method. In the evaluation stage, accuracy, specificity, and sensitivity are calculated. After the new data is entered into the system, the new data will go through a normalization process first. The new data that has been normalized then calculates the Euclidean distance with the training data from the best model obtained in the testing process. The results of the diagnosis process will show that the data is classified as normal or colorectal cancer.

Results: The change in the K value of the k-NN affects the performance of the k-NN method which consists of accuracy, specificity, and sensitivity. Based on tests conducted using 10-Fold Cross-Validation, the K-Nearest Neighbor method produces the best accuracy at K = 3 with an accuracy of 94.17%, a specificity of 94.43%, and a sensitivity of 94.41%.

Conclusion: The system can be said to be quite successful for classifying with a percentage that exceeds 90%. In addition, an evaluation of the future model is still needed.

Keywords: Colorectal cancer , k-Nearest Neighbor (k-NN), Machine learning, microRNAs, Classification

  • Organised By

  • Hosted By

Stay tuned! Don't miss an update from APDW 2021


For any enquiry e-mail at