sarek icon indicating copy to clipboard operation
sarek copied to clipboard

[BUG] Mutect2 fails

Open bounlu opened this issue 2 years ago • 4 comments

I run sarek as below:

nextflow run nf-core/sarek \
-profile docker \
-dsl1 \
--sentieon \
--step mapping \
--tools tnscope,mutect2,strelka,snpeff,vep \
--input '/Users/omeran/Desktop/aws/sarek/samplesheet.tsv' \
--outdir 's3://omeran/nextflow/sarek/results/' \
-bucket-dir 's3://omeran/nextflow/sarek/work/' \
-c '/Users/omeran/Desktop/aws/sarek/custom.config' \
-r master

It works fine for the steps till variant calling. However, I keep getting the below error on Mutect2 step:

Execution cancelled -- Finishing pending tasks before exit
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
Error executing process > 'Mutect2Single (FFLC85_Novaseq_tumour-chr16_46380683-90228345)'

Caused by:
  Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s

Command executed:

  # Get raw calls
  gatk --java-options "-Xmx7g"       Mutect2       -R Homo_sapiens_assembly38.fasta      -I FFLC85_Novaseq_tumour.recal.bam  -tumor FFLC85_Novaseq_tumour       -L chr16_46380683-90228345.bed              --germline-resource gnomAD.r2.1.1.GRCh38.PASS.AC.AF.only.vcf.gz              -O chr16_46380683-90228345_FFLC85_Novaseq_tumour.vcf

Command exit status:
  1

Command output:
  (empty)

Command wrapper:
  nxf-scratch-dir ip-172-31-45-123.ap-southeast-1.compute.internal:/tmp/nxf.8LordsTTl5
  
  An error occurred (AllAccessDisabled) when calling the ListObjectsV2 operation: All access to this object has been disabled
  fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
  
  An error occurred (AllAccessDisabled) when calling the ListObjectsV2 operation: All access to this object has been disabled
  fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
  download failed: s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory
  main: line 266:   112 Killed                  /home/ec2-user/.awscliv2/binaries/aws s3 cp --only-show-errors "$source" "$target"

Work dir:
  s3://omeran/nextflow/sarek/work/6b/0140014bdc844aebf4ce63b3cdb2b7

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line


Unexpected error [AbortedException]


Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler


Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler


Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler


-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing running tasks (271)

When I looked up for the error, I got 2 possibilities related to this error:

  1. Wrong IAM permissions -> here

I verify this is not the case for access to my buckets as the relevant role (ecsInstanceRole) has AmazonS3FullAccess.

But I am not sure if this points to the permissions to iGenomes folder.

  1. Non-existing folder path -> here

I am not sure how to rectify this, as I checked the path to reference fasta file and it exists.

$ nextflow info
  Version: 22.04.0 build 5697
  Created: 23-04-2022 18:00 UTC (24-04-2022 02:00 SGST)
  System: Mac OS X 11.3
  Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.11+9
  Encoding: UTF-8 (UTF-8)

bounlu avatar May 16 '22 09:05 bounlu

Hi @bounlu ! After reading the issue description again it occured to me that maybe the problem is that you are running from a different region than eu-west-1. However, I just want to confirm that all the other reference files are staged correctly?

FriederikeHanssen avatar Jun 14 '22 20:06 FriederikeHanssen

Yes indeed I was running it outside eu-west-1 region. Is there a kind of licence restriction? Other files had no such issue.

bounlu avatar Jun 14 '22 23:06 bounlu

hm then it is not that. I'll ask around

FriederikeHanssen avatar Jun 15 '22 07:06 FriederikeHanssen

Based on the Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s I am thinking it might not be Mutect2 specific but something else. How did you set everything up. Quick googleing for that error lead me here: https://aws.amazon.com/premiumsupport/knowledge-center/batch-docker-timeout-error/

FriederikeHanssen avatar Jun 15 '22 07:06 FriederikeHanssen

I've reproduced this and wanted to share some more notes that might otherwise remain buried in slack:

I used aws batch without tower. I'm doing lots of cold starts & spot instances, and while 4min isn't awesome, it's no reason to give up either. Certainly not given how many distinct jobs nextflow/sarek is launching.

nextflow -c /tmp/sarek-y1lceapy-nfconf.txt run nf-core/sarek -r 3.3.2 -bucket-dir s3://orange9-nf-logs/logs/batch-sarek-None-1696439753/1696439753 -work-dir s3://orange9-nf-runs/work/results/batch-sarek-None-1696439753/1696439753 --input s3://orange9-nf-runs/run/conf/sarek/batch-sarek-None-1696439753/job.csv --outdir s3://orange9-nf-runs/results/batch-sarek-None-1696439753/1696439753 --step mapping --skip_tools baserecalibrator --tools strelka --genome NCBI.GRCh38 --monochrome_logs -ansi-log false

N E X T F L O W  ~  version 23.04.4
Pulling nf-core/sarek ...
 downloaded from https://github.com/nf-core/sarek.git
Launching `https://github.com/nf-core/sarek` [nostalgic_davinci] DSL2 - revision: f034b73763 [3.3.2]
Downloading plugin [email protected]
Downloading plugin [email protected]
Downloading plugin [email protected]
Downloading plugin [email protected]
------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
      ____
    .´ _  `.
   /  |\`-_ \      __        __   ___     
  |   | \  `-|    |__`  /\  |__) |__  |__/
   \ |   \  /     .__| /¯¯\ |  \ |___ |  \
    `|____\´
  nf-core/sarek v3.3.2-gf034b73
------------------------------------------------------
Core Nextflow options
  revision              : 3.3.2
  runName               : nostalgic_davinci
  launchDir             : /tmp/job
  workDir               : /orange9-nf-runs/work/results/batch-sarek-None-1696439753/1696439753
  projectDir            : /home/myp3/.nextflow/assets/nf-core/sarek
  userName              : myp3
  profile               : standard
  configFiles           : 
Input/output options
  input                 : s3://orange9-nf-runs/run/conf/sarek/batch-sarek-None-1696439753/job.csv
  outdir                : s3://orange9-nf-runs/results/batch-sarek-None-1696439753/1696439753
Main options
  no_intervals          : true
  tools                 : strelka
  skip_tools            : baserecalibrator
Reference genome options
  genome                : NCBI.GRCh38
  bwa                   : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/
  fasta                 : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa
  snpeff_db             : 105
  snpeff_genome         : GRCh38
  vep_genome            : GRCh38
  vep_species           : homo_sapiens
  vep_cache_version     : 110
  igenomes_base         : s3://ngi-igenomes/igenomes
Institutional config options
  config_profile_contact: [email protected]
Generic options
  monochrome_logs       : true
  validationLenientMode : true
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/sarek for your analysis please cite:
* The pipeline
  https://doi.org/10.12688/f1000research.16665.2
  https://doi.org/10.5281/zenodo.3476425
* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
  https://github.com/nf-core/sarek/blob/master/CITATIONS.md
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS
WARN: There's no process matching config selector: APPLYBQSR
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:MARKDUPLICATES:GATK4_MARKDUPLICATES -- Did you mean: NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES?
[95/3f8592] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_MAP (lane_1-ASAMPLE)
[c0/184f19] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_MAP (lane_1-ASAMPLE)
[d4/e92d43] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX (genome.fa)
[66/48217b] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_UNMAP (lane_1-ASAMPLE)
[d1/7a3b14] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_UNMAP (lane_1-ASAMPLE)
[89/344ece] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY (genome.fa)
[99/8af785] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (no_intervals)
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (no_intervals)'
Caused by:
  Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s
Command executed:
  bgzip  --threads 1 -c  no_intervals.bed > no_intervals.bed.gz
  tabix  no_intervals.bed.gz
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED":
      tabix: $(echo $(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS
Command exit status:
  -
Command output:
  (empty)
Work dir:
  s3://orange9-nf-logs/logs/batch-sarek-None-1696439753/1696439753/99/8af78583fcf5413062e72607147d4f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Unexpected error [AbortedException]
 -- Check '.nextflow.log' file for details
ERROR ~ Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
 -- Check '.nextflow.log' file for details
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing running tasks (5)
ERROR ~ A DataflowVariable can only be assigned once. Use bind() to allow for equal values to be passed into already-bound variables.
 -- Check '.nextflow.log' file for details
[AWS BATCH] Waiting jobs reaper to complete (1 jobs to be terminated)

The error seems to originate as described here https://repost.aws/knowledge-center/batch-docker-timeout-error

and I'd hope that nextflow/sarek would recognizing this class of failure and be able to retry, instead of aborting a multijob run.

see also : https://github.com/aws/amazon-ecs-agent/issues/1440

cariaso avatar Oct 04 '23 18:10 cariaso

Could you share the .nextflow.log file?

adamrtalbot avatar Oct 04 '23 18:10 adamrtalbot

@adamrtalbot responding via slack

cariaso avatar Oct 04 '23 19:10 cariaso

Allen Zhao https://nextflow.slack.com/archives/C02T97HAV5M/p1698081984597039?thread_ts=1696443025.639869&cid=C02T97HAV5M I found changing ECS_CONTAINER_START_TIMEOUT and ECS_CONTAINER_CREATE_TIMEOUT to be a helpful workaround :)

cariaso avatar Oct 23 '23 17:10 cariaso