gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Adding VCF index argument to SelectVariants

Open meganshand opened this issue 2 years ago • 3 comments

Feature request

Tool(s) or class(es) involved

SelectVariants

Description

In order to run SelectVariants with VCF inputs that are in separate locations from their index files or to stream SelectVariants using https from Azure blob storage, we need a way to provide the index file in a separate argument from the -V input. @jamesemery started thinking this through (copying this from slack):

In featureDataSource.getTribbleFeatureReader() we currently initialize the datasources in getFeatureReader() which gets called by VariantWalker.initializeDrivingVariants() . You could stick an override into that where you thread down the path for the index source through that path and optionally (only if the index is explicitly supplied by the user) push it down into the getTribbleFeatureReader() calls at the bottom of the stack there.

@droazen any thoughts on this? @VJalili Would adding this feature to SelectVariants be useful for your pipelines at all?

meganshand avatar Oct 27 '23 20:10 meganshand

@meganshand We would like to add in a global mechanism for passing in explicit VCF indices, which we have long had for BAM/CRAM indices. @ldgauthier has requested this many times as well. We have a mechanism implemented in HTSJDK that allows you to pass in a JSON file (which we call a Bundle) containing URLs to both the main file and companion index, wherever you can currently pass in a raw file URL. Would this meet your needs?

droazen avatar Oct 30 '23 06:10 droazen

@droazen Yes, that sounds very convenient to use.

meganshand avatar Oct 30 '23 13:10 meganshand

Hi, I would like to be able to pass fasta index files in separate locations for HaplotypeCaller as well. Is there currently an option for that? I see that --read-index is for passing .bai explicitly but do not see any such flag for .fai.

mchenaux avatar Nov 30 '23 22:11 mchenaux