guacamole
guacamole copied to clipboard
joint caller: output phasing information
The joint caller should optionally output a csv file that gives for pairs A, B of variants (both germline and somatic) at each sample:
- total number of fragments (i.e. reads or mates of reads) overlapping both sites
- total number of fragments overlapping both and supporting either the variant or reference alleles at both sites (i.e. excluding reads supporting a third alternate)
- number of fragments supporting:
- variant allele for A and reference allele of B
- reference allele for A and variant allele of B
- variant alleles for both A and B
One possible application for this data is to contrain phylogeny inference: if all the reads supporting variant A also support variant B, then mutation A probably occurred after B
It might be useful to also implement this logic in varcode/topiary. Presumably, one would want downstream tools to be aware of the presence and relative strandedness of secondary germline/somatic variants w/in a particular genomic distance (e.g. the length of a PGV peptide, to pick a specific example)