lineage
lineage copied to clipboard
Add ability to reconstruct genomes
Combine techniques identified by Whit Athey in Phasing the Chromosomes of a Family Group When One Parent is Missing and the results of find_shared_dna to reconstruct genomes of maternal and/or paternal ancestors.
This can be approached as a constraint satisfaction problem. For example, the algorithm could be provided several individuals, with the maternal and/or paternal relationships also identified (e.g., siblings = [ind1, ind2]; mother = [ind3]; paternal_relation = [ind4]). Then, shared DNA could be discovered by find_shared_dna between all combinations of individuals. This information - whether the various combinations of individuals share one chromosome, both chromosomes, or no chromosomes for a given SNP position - would serve as the constraints for reconstructing the ancestral genomes.
As a simple example, say two siblings have genotypes of CA and AG at a given SNP. If one knew they shared one chromosome at that location, AN could be attributed to one parent, and CG to the other, where N would be any allele. Additional comparisons between other individuals could further narrow the solution space for the ancestral genomes.
Consider integrating https://github.com/poruloh/Eagle
Consider integrating https://github.com/poruloh/Eagle
This only seems useful if no familial DNA is available - IBD gives a much more conclusive result for phasing than statistical methods.
@ebacherdom, I agree. Like discussed above, I think using the results of find_shared_dna would help with this, especially when more comparisons of individuals in a family group are available. Formally, I think this is a constraint satisfaction problem.