Arun Ahuja comments

Results 62 comments of


                                            Arun Ahuja

support writing Guacamole commands in separate projects

@timodonnell What are thinking for this? Two things I could see are 1) having a github repo that has an example that is cloneable or 2) a maven archetype? I...

consider supporting other output formats (csv, json, arrow?) in joint caller

We could even support a cross-language serialization format

--loci argument conflicts with TakeLociIterator

This can be resolved with `--trim-ranges` but I think that should be the default then? Otherwise, I'm not sure how `TakeLociIterator` differentiates between empty loci and excluded loci.

exclude clipped reads from pileups

I could see the advantage of having the reads at the pileup as well since there is a difference between no reads mapped to that region or all the reads...

Apply assembly algorithms to variant-heavy regions

Related references: 1. Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 1–9 (2014). doi:10.1038/ng.3036 2. Rizk, G., Gouin, A.,...

add a way for comparing somatic-standard calls to gold VCF as part of unit tests

We've a developed a lot of this through the TCGA poster. I'm assembling a set of reads around ~100 variants that we can use as calibration using forward.

push down filtering by locus into a predicate

We can do this if we are loading ADAM reads or some other parquet serialized data. Thoughts are on our approach to this @ryan-williams @timodonnell

Make a public bucket or NFS filer available w/ sample data

Both of those are private buckets. The DREAM data is available in a public bucket at `gs://public-dream-data/`

Invalid biotype error when loading effects from TCGA-BLCA cohort

I've seen this as well with ensembl85 in `varcode`. @iskandr mentioned in the latest pyensembl, but we need to upgrade to the new `varcode` and `pyensembl` interfaces in `cohorts`. I've...

Find public or create simulated data for worked example

Closed by @jburos work in #147?