avocado
avocado copied to clipboard
Calling with Avocado using the "hive" range partitioned data
Hi @fnothaft - I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data. Input:
- gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets
- bam files saved as ADAM bin partitioned datasets.
The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.
My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance. Let me know if you have suggestions / comments about the usefulness of this.