guacamole
guacamole copied to clipboard
Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
We have many SAM files that used as gold sets for variant calling 1. Document/Log how each SAM file was created 2. Make this process reproducible
Some classes could use some more explanation: - GenotypeFilter.scala - need a file-level header comment describing high-level idea - PileupFilter.scala - need file-level header comment. These filters seem to be...
I had mistakenly thought that no one tried calling copy number variants on exome seq due to amplification and capture biases, but apparently the original varscan2 paper found they could...
Follow on from #288, only retrieve allele counts at passing variants
The joint caller should optionally output a csv file that gives for pairs A, B of variants (both germline and somatic) at each sample: - total number of fragments (i.e....
Some filters we should probably have (going by artifacts I've observed in real data) - Strand bias - Variants reads mostly start or end at the same place - Variants...
Once we have the phasing information collected in #389 , besides just writing it out we can also use it to make better calls Somatic variants should all be "consistent"...
There's quite a lot of business logic in `VCFOutput` around [here](https://github.com/hammerlab/guacamole/blob/master/src/main/scala/org/hammerlab/guacamole/commands/jointcaller/VCFOutput.scala#L135) where we take `MultiSampleMultiAlleleEvidence` instances and make them into `htsjdk.VariantContext` instances. This makes it non-trivial to support other output...
Two issues here: 1. Estimate expected variant allele frequencies in the tumor sample 1. Estimate tumor contamination in normal sample for matched normals