gatk
gatk copied to clipboard
[mutect2 workflow] variants_for_contamination_idx is never used
Bug Report
Hi, we are using the dockstore version of the GATK variant calling pipeline that leverages mutect 2: github.com/broadinstitute/gatk/mutect2:4.1.8.1
We're processing human glioma data, and currently we are making it through much of the pipeline, but failing on GetPileupSummaries. There's a thread about it on the discussion board [here] (https://gatk.broadinstitute.org/hc/en-us/community/posts/6179012337819-No-Pileup-Tables).
We are specifying a file for variants_for_contamination, and a file for variants_for_contamination_idx in the workflow, but the index is never passed to GetPileupSummaries, and it fails with this enigmatic error message:
A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
If you check out the source code in mutect2.wdl, you can see that that input variable variants_for_contamination_idx, which we have thoughtfully set and passed into the workflow, is never actually used in GetPileupSummaries. I'm not even sure there is an option to pass the index, from reading the docs. Here is an example of how the command is being called within our workflow:
gatk --java-options "-Xmx149500m" GetPileupSummaries -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/b4adf777-4f97-425c-b3e2-b37c9d927667/call-GatherBamFiles/SRR7588418.hg38.bam --interval-set-rule INTERSECTION -L gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/81583498-648e-4e70-8452-80509b626927/Mutect2/dbb6ef96-ea07-4cfe-9e85-3b133c6d89ea/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0030-scattered.interval_list \
-V gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz -L gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz -O tumor-pileups.table
Have you encountered this issue before? Is there a problem with the .gz file, or does IndexFeatureFile need to be a required step for variant contamination filtering, or does there need to be a supported option to pass the path to the index? Any help would be greatly appreciated, thank you!