sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Reference Allele Error on dbSNP on step HaplotypeCaller CNN1

Open barslmn opened this issue 1 year ago • 2 comments

Description of the bug

Hi, This is about the GATK resource bundle but I am opening a bug report here because I encounter this while using this pipeline. dbSNP138 has the wrong reference allele at position chr6:31236715.

image

Command used and terminal output

Command was:
nextflow run nf-core/sarek -profile docker --input WES_samplesheet.csv --outdir WES_output/ --genome GATK.GRCh37 --intervals GRCh37_exome_all_.bed --wes --save_reference --tools deepvariant,haplotypecaller --max_cpus 3 --max_memory 20.GB -with-trace --skip_tools baserecalibrator -resume
Terminal output looks like this:
-[nf-core/sarek] Pipeline completed with errors-
             Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_HAPLOTYPECALLER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES (mysample)'
 Caused by:
      Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_HAPLOTYPECALLER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES (mysample)` terminated with an error exit status (2)
      Command executed:
      gatk --java-options "-Xmx12g" FilterVariantTranches \
      --variant mysample.cnn.vcf.gz \
      --resource dbsnp_138.b37.vcf.gz --resource 1000G_phase1.indels.b37.vcf.gz --resource Mills_and_1000G_gold_standard.indels.b37.vcf.gz --resource 1000G_phase1.snps.high_confidence.b37.vcf.gz \
      --output mysample.haplotypecaller.filtered.vcf.gz \
      --tmp-dir . \
          --info-key CNN_1D
      cat <<-END_VERSIONS > versions.yml
        "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_HAPLOTYPECALLER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES":
  gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
     END_VERSIONS
 
.
.
.
o retrieve a sequence dictionary from the associated index file
        [95/1923]  16:07:11.173 WARN  IntelInflater - Zero Bytes Written : 0
    16:07:11.184 INFO  FilterVariantTranches - Done initializing engine
             16:07:11.294 INFO  ProgressMeter - Starting traversal
        16:07:11.295 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
       16:07:11.296 INFO  FilterVariantTranches - Starting pass 0 through the variants
 16:07:24.653 INFO  ProgressMeter -          1:150597890              0.2                  3000          13476.1
       16:07:39.146 INFO  ProgressMeter -           2:68402012              0.5                  6000          12926.4
       16:07:54.029 INFO  ProgressMeter -           3:44283525              0.7                  9000          12636.6
       16:08:04.120 INFO  ProgressMeter -             4:762963              0.9                 11000          12494.3
       16:08:15.045 INFO  ProgressMeter -            5:1216775              1.1                 13000          12235.5
       16:08:25.728 INFO  ProgressMeter -          5:180235722              1.2                 15000          12091.4
       16:08:28.560 INFO  FilterVariantTranches - Filtered 0 SNPs out of 14542 and filtered 0 indels out of 1233 with INFO score: CNN_1D.
       16:08:28.563 INFO  FilterVariantTranches - Shutting down engine
                 [December 20, 2022 at 4:08:28 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.FilterVariantTranches done. Elapsed time: 1.32 minutes.   Runtime.totalMemory()=2260729856
          ***********************************************************************
             A USER ERROR has occurred: Bad input: The provided variant file(s) have inconsistent references for the same position(s) at 6:31236715, GC* in input vs. GA* in resource
               ***********************************************************************
         Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
               Work dir:
                /home/owiepoc/work/37/f047461c745cbc3a799066d76ae620
    Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

No response

System information

nextflow version 22.10.3.5834 Container engine docker Local executor OS: Ubuntu 20.04.5 LTS x86_64 Host: VMware Virtual Platform None Kernel: 5.4.0-135-generic Uptime: 22 days, 15 mins Packages: 775 (dpkg), 6 (snap) Shell: bash 5.0.17 Resolution: preferred Terminal: /dev/pts/0 CPU: Intel Xeon Platinum 8360Y (20) @ 2.394GHz GPU: 00:0f.0 VMware SVGA II Adapter Memory: 3377MiB / 140829MiB

barslmn avatar Dec 29 '22 07:12 barslmn

My workaround:

I downloaded a newer version of the dbSNP from NCBI and supplied these files with --dbsnp --dbsnp_tbi to pipeline.

https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/00-common_all.vcf.gz
https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/00-common_all.vcf.gz.tbi

barslmn avatar Feb 21 '23 11:02 barslmn

This is still a problem in 3.4.0.

j-andrews7 avatar Feb 09 '24 23:02 j-andrews7