|
|
|
|
Analyses and comparison of K-nearest neighbour and AdaBoost algorithms
for genotype imputation
|
Abbas Mikhchi1, Mahmood Honarvar2, Nasser Emam
Jomeh Kashan1*, Saeed Zerehdaran3and Mehdi
Aminafshar1 |
1Department
of Animal Science, Science and Research Branch, Islamic Azad University,
Tehran, Iran; 2Department of Animal Science, Shahr-e-Qods
Branch, Islamic Azad University, Tehran, Iran; 3Department of
Animal Science, Ferdowsi University of Mashhad, Mashhad, Iran
|
|
Abstract |
Genomic selection has become a standard tool in dairy cattle breeding.
However, for other animal species, implementation of this technology is
hindered by the high cost of genotyping. Genotypic imputation is defined
as the prediction of genotypes for both unrelated individuals and
parent-offspring trios at the single nucleotide polymorphism (SNP)
locations in a sample of individuals for which assays are not directly
available. Several imputation methods are available for imputation
designed for livestock population. Machine learning methods have been
used in genetic studies to build models capable of predicting missing
values of a marker. In this study, strategies and factors affecting the
imputation accuracy of parent-offspring trios were compared using two
Machine Learning methods namely K-Nearest neighbour (KNN) and AdaBoost
(AB). The methods employed using simulated data to impute the un-typed
SNPs in parent-offspring trios. Two datasets of D1 (100 trios with 5k
SNPs) and D2 (500 trios with 5k SNPs) were simulated. The methods were
compared in terms of imputation accuracy and computation time and
factors affecting imputation accuracy (sample size). Comparison of two
methods for imputation showed that the KNN outperformed AB for
imputation accuracy. The time of computation was different between
methods. The KNN was the fastest algorithm. Accuracy of imputation
increased with increasing number of trios. Simulation datasets showed
that our methods performed very well for imputation of un-typed SNPs and
can be used as an alternative for imputation of parent-offspring trios
than other methods.
|
Keywords:
Trios; machine learning methods; imputation accuracy; computation time |
|
To cite this article:
Mikhchi A, M Honarvar, N Emam Jomeh Kashan, S Zerehdaran and M
Aminafshar,
2015.
Analyses and comparison of k-nearest neighbour and AdaBoost algorithms
for genotype imputation.
Res.
Opin. Anim. Vet. Sci., 5(7): 295-299. |
|
|