lineage icon indicating copy to clipboard operation
lineage copied to clipboard

Add ability to reconstruct genomes

Open apriha opened this issue 7 years ago • 3 comments

Combine techniques identified by Whit Athey in Phasing the Chromosomes of a Family Group When One Parent is Missing and the results of find_shared_dna to reconstruct genomes of maternal and/or paternal ancestors.

This can be approached as a constraint satisfaction problem. For example, the algorithm could be provided several individuals, with the maternal and/or paternal relationships also identified (e.g., siblings = [ind1, ind2]; mother = [ind3]; paternal_relation = [ind4]). Then, shared DNA could be discovered by find_shared_dna between all combinations of individuals. This information - whether the various combinations of individuals share one chromosome, both chromosomes, or no chromosomes for a given SNP position - would serve as the constraints for reconstructing the ancestral genomes.

As a simple example, say two siblings have genotypes of CA and AG at a given SNP. If one knew they shared one chromosome at that location, AN could be attributed to one parent, and CG to the other, where N would be any allele. Additional comparisons between other individuals could further narrow the solution space for the ancestral genomes.

apriha avatar Nov 19 '17 00:11 apriha

Consider integrating https://github.com/poruloh/Eagle

apriha avatar Mar 03 '18 20:03 apriha

Consider integrating https://github.com/poruloh/Eagle

This only seems useful if no familial DNA is available - IBD gives a much more conclusive result for phasing than statistical methods.

ebacherdom avatar May 28 '19 14:05 ebacherdom

@ebacherdom, I agree. Like discussed above, I think using the results of find_shared_dna would help with this, especially when more comparisons of individuals in a family group are available. Formally, I think this is a constraint satisfaction problem.

apriha avatar Jun 15 '19 16:06 apriha