guacamole icon indicating copy to clipboard operation
guacamole copied to clipboard

joint caller: output phasing information

Open timodonnell opened this issue 8 years ago • 1 comments

The joint caller should optionally output a csv file that gives for pairs A, B of variants (both germline and somatic) at each sample:

  • total number of fragments (i.e. reads or mates of reads) overlapping both sites
  • total number of fragments overlapping both and supporting either the variant or reference alleles at both sites (i.e. excluding reads supporting a third alternate)
  • number of fragments supporting:
    • variant allele for A and reference allele of B
    • reference allele for A and variant allele of B
    • variant alleles for both A and B

One possible application for this data is to contrain phylogeny inference: if all the reads supporting variant A also support variant B, then mutation A probably occurred after B

timodonnell avatar Feb 29 '16 17:02 timodonnell

It might be useful to also implement this logic in varcode/topiary. Presumably, one would want downstream tools to be aware of the presence and relative strandedness of secondary germline/somatic variants w/in a particular genomic distance (e.g. the length of a PGV peptide, to pick a specific example)

JPFinnigan avatar Mar 01 '16 21:03 JPFinnigan