E-ISSN 2223-0343

Analyses and comparison of K-nearest neighbour and AdaBoost algorithms for genotype imputation

Abbas Mikhchi1, Mahmood Honarvar2, Nasser Emam Jomeh Kashan1*, Saeed Zerehdaran3and Mehdi Aminafshar1

1Department of Animal Science, Science and Research Branch, Islamic Azad University, Tehran, Iran; 2Department of Animal Science, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran; 3Department of Animal Science, Ferdowsi University of Mashhad, Mashhad, Iran

 
Abstract

Genomic selection has become a standard tool in dairy cattle breeding. However, for other animal species, implementation of this technology is hindered by the high cost of genotyping. Genotypic imputation is defined as the prediction of genotypes for both unrelated individuals and parent-offspring trios at the single nucleotide polymorphism (SNP) locations in a sample of individuals for which assays are not directly available. Several imputation methods are available for imputation designed for livestock population. Machine learning methods have been used in genetic studies to build models capable of predicting missing values of a marker. In this study, strategies and factors affecting the imputation accuracy of parent-offspring trios were compared using two Machine Learning methods namely K-Nearest neighbour (KNN) and AdaBoost (AB). The methods employed using simulated data to impute the un-typed SNPs in parent-offspring trios. Two datasets of D1 (100 trios with 5k SNPs) and D2 (500 trios with 5k SNPs) were simulated. The methods were compared in terms of imputation accuracy and computation time and factors affecting imputation accuracy (sample size). Comparison of two methods for imputation showed that the KNN outperformed AB for imputation accuracy. The time of computation was different between methods. The KNN was the fastest algorithm. Accuracy of imputation increased with increasing number of trios. Simulation datasets showed that our methods performed very well for imputation of un-typed SNPs and can be used as an alternative for imputation of parent-offspring trios than other methods.

Keywords: Trios; machine learning methods; imputation accuracy; computation time
 
To cite this article: Mikhchi A, M Honarvar, N Emam Jomeh Kashan, S Zerehdaran and M Aminafshar, 2015. Analyses and comparison of k-nearest neighbour and AdaBoost algorithms for genotype imputation. Res. Opin. Anim. Vet. Sci., 5(7): 295-299.
 

Home  |  Archive  |  Instructions  |  Submission  |  Editorial Board  |  Sample Paper

All rights reserved © roavs.com