avocado icon indicating copy to clipboard operation
avocado copied to clipboard

Calling with Avocado using the "hive" range partitioned data

Open jpdna opened this issue 7 years ago • 0 comments

Hi @fnothaft - I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data. Input:

  1. gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets
  2. bam files saved as ADAM bin partitioned datasets.

The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.

My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance. Let me know if you have suggestions / comments about the usefulness of this.

jpdna avatar Jan 06 '18 20:01 jpdna