lineage
lineage copied to clipboard
Add ability to reconstruct genomes
Combine techniques identified by Whit Athey in Phasing the Chromosomes of a Family Group When One Parent is Missing and the results of find_shared_dna
to reconstruct genomes of maternal and/or paternal ancestors.
This can be approached as a constraint satisfaction problem. For example, the algorithm could be provided several individuals, with the maternal and/or paternal relationships also identified (e.g., siblings = [ind1, ind2]; mother = [ind3]; paternal_relation = [ind4]
). Then, shared DNA could be discovered by find_shared_dna
between all combinations of individuals. This information - whether the various combinations of individuals share one chromosome, both chromosomes, or no chromosomes for a given SNP position - would serve as the constraints for reconstructing the ancestral genomes.
As a simple example, say two siblings have genotypes of CA
and AG
at a given SNP. If one knew they shared one chromosome at that location, AN
could be attributed to one parent, and CG
to the other, where N
would be any allele. Additional comparisons between other individuals could further narrow the solution space for the ancestral genomes.
Consider integrating https://github.com/poruloh/Eagle
Consider integrating https://github.com/poruloh/Eagle
This only seems useful if no familial DNA is available - IBD gives a much more conclusive result for phasing than statistical methods.
@ebacherdom, I agree. Like discussed above, I think using the results of find_shared_dna
would help with this, especially when more comparisons of individuals in a family group are available. Formally, I think this is a constraint satisfaction problem.