guacamole
guacamole copied to clipboard
Spark-based variant calling, with experimental support for multi-sample somatic calling (including RNA) and local assembly
Most job on a full genome print the following warning: ``` Not enough space to cache broadcast_2 in memory! (computed 488.3 MB so far) ``` What is the broadcast variable...
From the methods section of ["Recurrent somatic mutations in regulatory regions of human cancer genomes"](http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3332.html): > **Filtering out false positives from mapping errors and SNPs.** > SNPs from dbSNP Build...
We currently include clipped (either S or N cigar operator) reads in the pleup at a locus. For RNA-seq this is a performance issue (and arguably a programming gotcha) since...
- Upgrade to ADAM after https://github.com/bigdatagenomics/adam/pull/875 - Add back in test from https://github.com/hammerlab/guacamole/commit/564a7a4c3c8ee21ff3d56b7f9ec52bab4c875bcf - Update comment/functionality at thttps://github.com/hammerlab/guacamole/blob/master/src/main/scala/org/hammerlab/guacamole/pileup/PileupElement.scala#L132-L133
(summarizing @arahuja on the subject) **Simple Version** When many SNVs are found close together, consider dropping them under the assumption that something may have been wrong with the reads there....
I wanted to discuss moving Guacamole back to a multi-module project. There are two main reasons I'd like to do this 1. Separate out Spark and non-Spark components, it'd be...
I typically see more than a minute of reference-broadcasting at the beginning of apps before the first stage starts; we should be able to do something more efficient there.
As we run guacamole on more inputs, we have config blocks like [this](https://github.com/hammerlab/variant-calling-benchmarks/blob/07c8f43c5536a8d8dc2534f0fa871f64243912e0/benchmarks/pt189/benchmark.json) in [variant-calling-benchmarks](https://github.com/hammerlab/variant-calling-benchmarks) (VCB). Some of those JSON blocks correspond directly to structures that exist in Guacamole, e.g....
Spark's log4j wrapper is already used in a few places for e.g. logging warnings, and we should fold our logging into that infrastructure, where we currently have [a tiny homegrown...