sarek icon indicating copy to clipboard operation
sarek copied to clipboard

error in test by haplotypecaller

Open jbague opened this issue 1 year ago • 8 comments

Description of the bug

I cannot finish the test of haplotypecaller tool on singularity container. Of course, when I run my germline samples, I obtained the same error. The pipeline fails on the last step of variant calling step. I check with strelka tool, and the pipeline test finish correctly.

Command used and terminal output

nextflow run 3_4_0/main.nf -profile test,singularity --tools haplotypecaller --outdir ./results

[15/8868cb] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:VCF_VARIANT_FILTERING_GATK:FILTERVA... [100%] 1 of 1, failed: 1 ✘
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:BCFTOOLS_STATS                                 -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT                            -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_QUAL                             -
[-        ] process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_SUMMARY                               -
[-        ] process > NFCORE_SAREK:SAREK:CUSTOM_DUMPSOFTWAREVERSIONS                                             -
[-        ] process > NFCORE_SAREK:SAREK:MULTIQC                                                                 -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/sarek] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES (test)'

Caused by:
  Process `NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES (test)` terminated with an error exit status (2)

Command executed:

  gatk --java-options "-Xmx5324M -XX:-UsePerfData" \
      FilterVariantTranches \
      --variant test.cnn.vcf.gz \
      --resource dbsnp_146.hg38.vcf.gz --resource mills_and_1000G.indels.vcf.gz \
      --output test.haplotypecaller.filtered.vcf.gz \
      --tmp-dir . \
      --info-key CNN_1D
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES":
      gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  Using GATK jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5324M -XX:-UsePerfData -jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar FilterVariantTranches --variant test.cnn.vcf.gz --resource dbsnp_146.hg38.vcf.gz --resource mills_and_1000G.indels.vcf.gz --output test.haplotypecaller.filtered.vcf.gz --tmp-dir . --info-key CNN_1D
  16:07:18.011 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
  16:07:18.048 INFO  FilterVariantTranches - ------------------------------------------------------------
  16:07:18.052 INFO  FilterVariantTranches - The Genome Analysis Toolkit (GATK) v4.4.0.0
  16:07:18.052 INFO  FilterVariantTranches - For support and documentation go to https://software.broadinstitute.org/gatk/
  16:07:18.052 INFO  FilterVariantTranches - Executing as bague@cbp10055 on Linux v5.15.0-92-generic amd64
  16:07:18.052 INFO  FilterVariantTranches - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
  16:07:18.052 INFO  FilterVariantTranches - Start Date/Time: February 7, 2024 at 4:07:17 PM GMT
  16:07:18.052 INFO  FilterVariantTranches - ------------------------------------------------------------
  16:07:18.052 INFO  FilterVariantTranches - ------------------------------------------------------------
  16:07:18.053 INFO  FilterVariantTranches - HTSJDK Version: 3.0.5
  16:07:18.053 INFO  FilterVariantTranches - Picard Version: 3.0.0
  16:07:18.053 INFO  FilterVariantTranches - Built for Spark Version: 3.3.1
  16:07:18.053 INFO  FilterVariantTranches - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  16:07:18.054 INFO  FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  16:07:18.054 INFO  FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  16:07:18.054 INFO  FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  16:07:18.054 INFO  FilterVariantTranches - Deflater: IntelDeflater
  16:07:18.054 INFO  FilterVariantTranches - Inflater: IntelInflater
  16:07:18.054 INFO  FilterVariantTranches - GCS max retries/reopens: 20
  16:07:18.054 INFO  FilterVariantTranches - Requester pays: disabled
  16:07:18.055 INFO  FilterVariantTranches - Initializing engine
  16:07:18.129 INFO  FeatureManager - Using codec VCFCodec to read file file://dbsnp_146.hg38.vcf.gz
  16:07:18.132 WARN  IntelInflater - Zero Bytes Written : 0
  16:07:18.139 INFO  FeatureManager - Using codec VCFCodec to read file file://mills_and_1000G.indels.vcf.gz
  16:07:18.140 WARN  IntelInflater - Zero Bytes Written : 0
  16:07:18.146 INFO  FeatureManager - Using codec VCFCodec to read file file://test.cnn.vcf.gz
  16:07:18.147 WARN  IntelInflater - Zero Bytes Written : 0
  16:07:18.148 WARN  IntelInflater - Zero Bytes Written : 0
  16:07:18.152 INFO  FilterVariantTranches - Done initializing engine
  16:07:18.168 INFO  ProgressMeter - Starting traversal
  16:07:18.169 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
  16:07:18.169 INFO  FilterVariantTranches - Starting pass 0 through the variants
  16:07:18.170 WARN  IntelInflater - Zero Bytes Written : 0
  16:07:18.171 INFO  FilterVariantTranches - Finished pass 0 through the variants
  16:07:18.171 INFO  FilterVariantTranches - Found 0 SNPs and 0 indels with INFO score key:CNN_1D.
  16:07:18.171 INFO  FilterVariantTranches - Found 0 SNPs and 0 indels in the resources.
  16:07:18.171 INFO  FilterVariantTranches - Filtered 0 SNPs out of 0 and filtered 0 indels out of 0 with INFO score: CNN_1D.
  16:07:18.173 INFO  FilterVariantTranches - Shutting down engine
  [February 7, 2024 at 4:07:18 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.FilterVariantTranches done. Elapsed time: 0.00 minutes.
  Runtime.totalMemory()=125829120
  ***********************************************************************
  
  A USER ERROR has occurred: Bad input: VCF contains no variants or no variants with INFO score key "CNN_1D"
  
  ***********************************************************************
  Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Work dir:
  /media/bague/D_2/descriptiu_marato_tv3/prova_nf_3_4_0/work/15/8868cbf90b18654a33c937f88a5bc9

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Nextflow version  23.04.0
Hardware Desktop
Executor  local
Container engine: Singularity
Os: Ubuntu 20.0.04
Version of nf-core/sarek 3.4.0

jbague avatar Feb 07 '24 16:02 jbague

I try to use both GATK genomes (GRCh37 and GRCh38) but I cannot skip the error. I am not sure why I cannot use haplotypecaller but I can run all pipeline by strelka caller. It looks like that error begins when the pipeline needs the gatk singularity container...

jbague avatar Feb 08 '24 14:02 jbague

I also reported that error: https://github.com/nf-core/sarek/issues/1146

Could you try running the pipeline with the option --skip_tools haplotyper_filter?

https://nf-co.re/sarek/3.4.0/parameters#skip_tools

asp8200 avatar Feb 08 '24 14:02 asp8200

I have just tried but the pipeline continues finishing with errors...

jbague avatar Feb 08 '24 15:02 jbague

Come on over on nf-core/sarek slack, and let's take a look at those errors

asp8200 avatar Feb 08 '24 15:02 asp8200

Excuse me, I introduce the option --skip_tools haplotypecaller_filter and I can skip the error. The pipeline finished correctly. However, I have a doubt: If we introduce this parameter to overpass the error while we are removing filtering steps, how we should analyze downstream? Is it recommendatory to add any filter extra directly against the vcf resulted?

jbague avatar Feb 08 '24 15:02 jbague

Excuse me, I introduce the option --skip_tools haplotypecaller_filter and I can skip the error. The pipeline finished correctly. However, I have a doubt: If we introduce this parameter to overpass the error while we are removing filtering steps, how we should analyze downstream? Is it recommendatory to add any filter extra directly against the vcf resulted?

I would run Sarek with the haplotypecaller-filter activated and only deactivate it for the odd sample with "no variants or no variants with INFO score key CNN_1D" - perhaps @maxulysse, @FriederikeHanssen or @tdanhorn has more to say on this?

asp8200 avatar Feb 08 '24 15:02 asp8200