英国牛津大学Jerome Kelleher团队的一项最新研究,开发出了新的算法,能够在大型人口数据集中推断全基因组历史。2019年9月出版的《自然—遗传学》发表了这项成果。
研究人员开发出一种算法,它不仅能够推断出与现有技术相当的精确度的全基因组历史,而且能够处理超过四个数量级的序列。该方法还提供数据的“进化编码”,从而能够有效地计算相关统计数据。研究人员将该方法应用于来自1000基因组计划(1000 Genomes Project),Simons基因组多样性计划(Simons Genome Diversity Project)和英国生物银行(UK Biobank)的人类数据,发现推断的系谱具有丰富的生物信号并且能够有效处理。
据介绍,推断一组DNA序列的完整谱系历史是进化生物学中的核心问题,因为该历史编码了关于影响物种的事件和力的信息。然而,目前的方法是有限的,并且最精确的技术能够处理不超过一百个样本。由于现在正在收集由数百万个基因组组成的数据集,因此需要可扩展且有效的推理方法来充分利用这些资源。
附:英文原文
Title: Inferring whole-genome histories in large population datasets
Author: Jerome Kelleher, Yan Wong, Anthony W. Wohns, Chaimaa Fadil, Patrick K. Albers, Gil McVean
Issue&Volume: Volume 51 Issue 9
Abstract: Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an evolutionary encoding of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.
DOI: 10.1038/s41588-019-0483-y
Source:https://www.nature.com/articles/s41588-019-0483-y
Nature Genetics:《自然—遗传学》,创刊于1992年。隶属于施普林格·自然出版集团,最新IF:25.455
官方网址:https://www.nature.com/ng/
投稿链接:https://mts-ng.nature.com/cgi-bin/main.plex