PAKISTAN
VETERINARY
JOURNAL
     
 
previous page   Pak Vet J, 2026, 46(4) 816-829   next page
 
K-mer-based Machine Learning Method to Classify Echinococcus granulosus S.L. Mitochondrial DNA Sequences
 
Enas Al-khlifeh1*, Ahmad B. Hassanat2, Suleyman A. AlShowarah2,3, Ahmad S. Tarawneh2, Lujain A Alhasanat4 and Awni Hammouri2

1Applied Biology Department, Faculty of Science, Al-Balqa Applied University, Al-Salt, Jordan, 2Faculty of Information Technology, Mutah University, Al-Karak, Jordan, 3Software Engineering Department, Faculty of Science and Information Technology, Al-Zaytoonah University of Jordan, Jordan, 4Faculty of Medicine, Mutah University, Karak, 61710, Jordan

*Corresponding author: Al-khlifeh.en@bau.edu.jo

Abstract   

There are ongoing debates regarding the taxonomy of the E. granulosus Echinococcus granulosus sensu lato (s.l.) complex, with the species status of genotypes G1/G3 (E. granulosus s.s.) and the G6–G10 cluster (E. canadensis) being particularly contested. To solve this challenge, we develop a machine learning (ML)-based k-mer method to predict genotypes of E. granulosus s.l. via the mitochondrial DNA sequence data available in GenBank. We used 7-mer sequences that represent mtDNA to reveal untapped diversity across 948 sequences of seven known genotypes of E. granulosus s.l. We evaluated the utility of varying k-mer lengths, including fixed vs adaptive lengths, for sequence comparisons. Principal component analysis (PCA) identified the most discriminative patterns, addressed the computational challenges posed by high-dimensional features and fed them into 5 distinct ML algorithms, including both conventional (LR, RF and SVM) and DL (1D-CNN and the LSTM) methods. Moreover, crucial insights into performance implications and efficiency improvements have been emphasized via 10-fold cross-validation. Our method attains approximately 95% accuracy in predicting E. granulosus s.l. genotypes. High genetic diversity within G6, G8 and G3 contributes significantly to the controversial taxonomy of the E. canadensis cluster and E. granulosus s.s., respectively. Moreover, ML can potentially compete with BLAST (the NCBI sequence alignment tool) when the BLAST-Similarity-KNN classifier is implemented on the E. granulosus mtDNA data. This study provides a novel approach for classifying E. granulosus s.l. at the species level, thereby supporting disease control activities.

To Cite This Article: Al-khlifeh A, Hassanat AB, AlShowarah SA, Tarawneh AS, Alhasanat LA and Hammouri A, 2026. K-mer-based machine learning method to classify Echinococcus granulosus S.L. mitochondrial DNA sequences. Pak Vet J, 46(4): 816-829. http://dx.doi.org/10.29261/pakvetj/2026.076

 
 
   
 

ISSN 0253-8318 (Print)
ISSN 2074-7764 (Online)



scopus
 
DOI
 
DOAJ SEAL
  
SCImago Journal & Country Rank