sarek
sarek copied to clipboard
[BUG] Mutect2 fails
I run sarek as below:
nextflow run nf-core/sarek \
-profile docker \
-dsl1 \
--sentieon \
--step mapping \
--tools tnscope,mutect2,strelka,snpeff,vep \
--input '/Users/omeran/Desktop/aws/sarek/samplesheet.tsv' \
--outdir 's3://omeran/nextflow/sarek/results/' \
-bucket-dir 's3://omeran/nextflow/sarek/work/' \
-c '/Users/omeran/Desktop/aws/sarek/custom.config' \
-r master
It works fine for the steps till variant calling. However, I keep getting the below error on Mutect2
step:
Execution cancelled -- Finishing pending tasks before exit
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
Error executing process > 'Mutect2Single (FFLC85_Novaseq_tumour-chr16_46380683-90228345)'
Caused by:
Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s
Command executed:
# Get raw calls
gatk --java-options "-Xmx7g" Mutect2 -R Homo_sapiens_assembly38.fasta -I FFLC85_Novaseq_tumour.recal.bam -tumor FFLC85_Novaseq_tumour -L chr16_46380683-90228345.bed --germline-resource gnomAD.r2.1.1.GRCh38.PASS.AC.AF.only.vcf.gz -O chr16_46380683-90228345_FFLC85_Novaseq_tumour.vcf
Command exit status:
1
Command output:
(empty)
Command wrapper:
nxf-scratch-dir ip-172-31-45-123.ap-southeast-1.compute.internal:/tmp/nxf.8LordsTTl5
An error occurred (AllAccessDisabled) when calling the ListObjectsV2 operation: All access to this object has been disabled
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
An error occurred (AllAccessDisabled) when calling the ListObjectsV2 operation: All access to this object has been disabled
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
download failed: s3://ngi-igenomes/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta to ./Homo_sapiens_assembly38.fasta [Errno 12] Cannot allocate memory
main: line 266: 112 Killed /home/ec2-user/.awscliv2/binaries/aws s3 cp --only-show-errors "$source" "$target"
Work dir:
s3://omeran/nextflow/sarek/work/6b/0140014bdc844aebf4ce63b3cdb2b7
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Unexpected error [AbortedException]
Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing running tasks (271)
When I looked up for the error, I got 2 possibilities related to this error:
- Wrong IAM permissions -> here
I verify this is not the case for access to my buckets as the relevant role (ecsInstanceRole) has AmazonS3FullAccess
.
But I am not sure if this points to the permissions to iGenomes folder.
- Non-existing folder path -> here
I am not sure how to rectify this, as I checked the path to reference fasta file and it exists.
$ nextflow info
Version: 22.04.0 build 5697
Created: 23-04-2022 18:00 UTC (24-04-2022 02:00 SGST)
System: Mac OS X 11.3
Runtime: Groovy 3.0.10 on OpenJDK 64-Bit Server VM 11.0.11+9
Encoding: UTF-8 (UTF-8)
Hi @bounlu ! After reading the issue description again it occured to me that maybe the problem is that you are running from a different region than eu-west-1. However, I just want to confirm that all the other reference files are staged correctly?
Yes indeed I was running it outside eu-west-1 region. Is there a kind of licence restriction? Other files had no such issue.
hm then it is not that. I'll ask around
Based on the Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s
I am thinking it might not be Mutect2 specific but something else. How did you set everything up. Quick googleing for that error lead me here: https://aws.amazon.com/premiumsupport/knowledge-center/batch-docker-timeout-error/
I've reproduced this and wanted to share some more notes that might otherwise remain buried in slack:
I used aws batch without tower. I'm doing lots of cold starts & spot instances, and while 4min isn't awesome, it's no reason to give up either. Certainly not given how many distinct jobs nextflow/sarek is launching.
nextflow -c /tmp/sarek-y1lceapy-nfconf.txt run nf-core/sarek -r 3.3.2 -bucket-dir s3://orange9-nf-logs/logs/batch-sarek-None-1696439753/1696439753 -work-dir s3://orange9-nf-runs/work/results/batch-sarek-None-1696439753/1696439753 --input s3://orange9-nf-runs/run/conf/sarek/batch-sarek-None-1696439753/job.csv --outdir s3://orange9-nf-runs/results/batch-sarek-None-1696439753/1696439753 --step mapping --skip_tools baserecalibrator --tools strelka --genome NCBI.GRCh38 --monochrome_logs -ansi-log false
N E X T F L O W ~ version 23.04.4
Pulling nf-core/sarek ...
downloaded from https://github.com/nf-core/sarek.git
Launching `https://github.com/nf-core/sarek` [nostalgic_davinci] DSL2 - revision: f034b73763 [3.3.2]
Downloading plugin [email protected]
Downloading plugin [email protected]
Downloading plugin [email protected]
Downloading plugin [email protected]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
____
.´ _ `.
/ |\`-_ \ __ __ ___
| | \ `-| |__` /\ |__) |__ |__/
\ | \ / .__| /¯¯\ | \ |___ | \
`|____\´
nf-core/sarek v3.3.2-gf034b73
------------------------------------------------------
Core Nextflow options
revision : 3.3.2
runName : nostalgic_davinci
launchDir : /tmp/job
workDir : /orange9-nf-runs/work/results/batch-sarek-None-1696439753/1696439753
projectDir : /home/myp3/.nextflow/assets/nf-core/sarek
userName : myp3
profile : standard
configFiles :
Input/output options
input : s3://orange9-nf-runs/run/conf/sarek/batch-sarek-None-1696439753/job.csv
outdir : s3://orange9-nf-runs/results/batch-sarek-None-1696439753/1696439753
Main options
no_intervals : true
tools : strelka
skip_tools : baserecalibrator
Reference genome options
genome : NCBI.GRCh38
bwa : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/
fasta : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa
snpeff_db : 105
snpeff_genome : GRCh38
vep_genome : GRCh38
vep_species : homo_sapiens
vep_cache_version : 110
igenomes_base : s3://ngi-igenomes/igenomes
Institutional config options
config_profile_contact: [email protected]
Generic options
monochrome_logs : true
validationLenientMode : true
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/sarek for your analysis please cite:
* The pipeline
https://doi.org/10.12688/f1000research.16665.2
https://doi.org/10.5281/zenodo.3476425
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/sarek/blob/master/CITATIONS.md
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS
WARN: There's no process matching config selector: APPLYBQSR
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:MARKDUPLICATES:GATK4_MARKDUPLICATES -- Did you mean: NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES?
[95/3f8592] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_MAP (lane_1-ASAMPLE)
[c0/184f19] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_MAP (lane_1-ASAMPLE)
[d4/e92d43] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX (genome.fa)
[66/48217b] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_UNMAP (lane_1-ASAMPLE)
[d1/7a3b14] Submitted process > NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_UNMAP (lane_1-ASAMPLE)
[89/344ece] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY (genome.fa)
[99/8af785] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (no_intervals)
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (no_intervals)'
Caused by:
Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s
Command executed:
bgzip --threads 1 -c no_intervals.bed > no_intervals.bed.gz
tabix no_intervals.bed.gz
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED":
tabix: $(echo $(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*$//')
END_VERSIONS
Command exit status:
-
Command output:
(empty)
Work dir:
s3://orange9-nf-logs/logs/batch-sarek-None-1696439753/1696439753/99/8af78583fcf5413062e72607147d4f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Unexpected error [AbortedException]
-- Check '.nextflow.log' file for details
ERROR ~ Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
-- Check '.nextflow.log' file for details
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
-[nf-core/sarek] Pipeline completed with errors-
WARN: Killing running tasks (5)
ERROR ~ A DataflowVariable can only be assigned once. Use bind() to allow for equal values to be passed into already-bound variables.
-- Check '.nextflow.log' file for details
[AWS BATCH] Waiting jobs reaper to complete (1 jobs to be terminated)
The error seems to originate as described here https://repost.aws/knowledge-center/batch-docker-timeout-error
and I'd hope that nextflow/sarek would recognizing this class of failure and be able to retry, instead of aborting a multijob run.
see also : https://github.com/aws/amazon-ecs-agent/issues/1440
Could you share the .nextflow.log file?
@adamrtalbot responding via slack
Allen Zhao https://nextflow.slack.com/archives/C02T97HAV5M/p1698081984597039?thread_ts=1696443025.639869&cid=C02T97HAV5M I found changing ECS_CONTAINER_START_TIMEOUT and ECS_CONTAINER_CREATE_TIMEOUT to be a helpful workaround :)