sarek
sarek copied to clipboard
mutect2 error with tumor-normal pairs starting from BAM files
Description of the bug
Hi, I run into the following error with running mutect2 via sarek:
INFO: Converting SIF file to temporary sandbox...
Using GATK jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx29491M -XX:-UsePerfData -jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar Mutect2 --input patient1-10A.converted.cram --input patient1-01A.converted.cram --output patient1-01A_vs_patient1-10A.mutect2.vcf.gz --reference GRCh38.d1.vd1.fa --intervals chr17_7651780-7697538.bed --tmp-dir . --f1r2-tar-gz patient1-01A_vs_patient1-10A.mutect2.f1r2.tar.gz --normal-sample patient1_patient1-10A
15:23:13.503 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:23:13.529 INFO Mutect2 - ------------------------------------------------------------
15:23:13.531 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.4.0.0
15:23:13.531 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
15:23:13.531 INFO Mutect2 - Executing as wij20zul@hilbert166 on Linux v3.10.0-1160.49.1.el7.x86_64 amd64
15:23:13.531 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
15:23:13.532 INFO Mutect2 - Start Date/Time: April 15, 2024 at 3:23:13 PM GMT
15:23:13.532 INFO Mutect2 - ------------------------------------------------------------
15:23:13.532 INFO Mutect2 - ------------------------------------------------------------
15:23:13.532 INFO Mutect2 - HTSJDK Version: 3.0.5
15:23:13.532 INFO Mutect2 - Picard Version: 3.0.0
15:23:13.533 INFO Mutect2 - Built for Spark Version: 3.3.1
15:23:13.533 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:23:13.533 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:23:13.533 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:23:13.533 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:23:13.533 INFO Mutect2 - Deflater: IntelDeflater
15:23:13.533 INFO Mutect2 - Inflater: IntelInflater
15:23:13.533 INFO Mutect2 - GCS max retries/reopens: 20
15:23:13.533 INFO Mutect2 - Requester pays: disabled
15:23:13.534 INFO Mutect2 - Initializing engine
15:23:14.107 INFO FeatureManager - Using codec BEDCodec to read file chr17_7651780-7697538.bed
15:23:14.111 INFO IntervalArgumentCollection - Processing 129382 bp from intervals
15:23:14.119 INFO Mutect2 - Done initializing engine
15:23:14.149 INFO Mutect2 - Shutting down engine
[April 15, 2024 at 3:23:14 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=285212672
***********************************************************************
A USER ERROR has occurred: Bad input: Sample patient1_patient1-10A is not in BAM header: [patient1-01A-31W-A062-09, patient1-10A-01W-A062-09]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
INFO: Cleaning up image...
It seems that there is something wrong with the mutect2 command in this configuration.
It defines a normal sample which cannot be found by mutect2 (from the .command.sh):
#!/bin/bash -euo pipefail
gatk --java-options "-Xmx29491M -XX:-UsePerfData" \
Mutect2 \
--input patient1-10A.converted.cram --input patient1-01A.converted.cram \
--output patient1-01A_vs_patient1-10A.mutect2.vcf.gz \
--reference GRCh38.d1.vd1.fa \
\
\
--intervals chr17_7651780-7697538.bed \
--tmp-dir . \
--f1r2-tar-gz patient1-01A_vs_patient1-10A.mutect2.f1r2.tar.gz --normal-sample patient1_patient1-10A
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:MUTECT2_PAIRED":
gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS
Freebayes and strelka run fine without an error message.
I have the following configuration:
- Starting variant calling from aligned BAM files.
- always having tumor-normal pairs
- targeted sequencing
Can you help me fixing this problem?
Command used and terminal output
nf-params.json:
{
"step": "variant_calling",
"outdir": "results/",
"wes": true,
"intervals": "targets.bed",
"tools": "msisensorpro,mutect2,freebayes,strelka",
"genome": "GRCh38",
"fasta": "GRCh38.d1.vd1.fa",
"fasta_fai": "GRCh38.d1.vd1.fa.fai",
"save_reference": true,
"save_mapped": false,
"igenomes_ignore": true,
}
samplesheet (example):
patient,status,sample,bam,bai
patient1,1,patient1-01A,sample1_wxs_gdc_realn.bam,sample1_wxs_gdc_realn.bai
patient1,0,patient1-10A,sample2_wxs_gdc_realn.bam,sample2_wxs_gdc_realn.bai
run command:
nextflow run nf-core-sarek_3.4.0/3_4_0 \
-profile singularity,hilbert \
-c config/run-sarek.config \
-params-file config/nf-params.json \
-plugins [email protected] \
--input samplesheet.csv
### Relevant files
_No response_
### System information
nf-core sarek verison 3.4.0
I had a similar issue on 3.4.1, where it gave me an error about not having a tumor-sample which was required by mutect2, I added all the possible columns (icluding sex) and it fixed the bug, it might not be parsing the sample table properly. Try with that and see if it fixes it for you. It is a different error, once I am back on my rig I can update this with the error log.
Has this been resolved?
I have an almost identical issue, also starting with variant calling from aligned bam files, and Mutect2 is also complaining that a sample isn't in the bam header. Has anyone been able to resolve this?
My samplesheet contains patient,sex,status,sample,bam,bai
Hi, so I could resolve this error as following:
The problem here is the mismatch between the sample names in the BAM header (when starting with BAM files) and the sample names specified in the mutect2 parameters defined in conf/modules/mutect2.config.
conf/modules/mutect2.config contains in the process scope:
withName: 'MUTECT2_PAIRED' {
ext.args = { params.ignore_soft_clipped_bases ?
"--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" :
"--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" }
}
I overwrote this in the process scope of a separate config file provided with -c so that is fits the sample names in my BAM files:
withName: 'MUTECT2_PAIRED' {
ext.args = { params.ignore_soft_clipped_bases ?
"--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" :
"--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" }
}
So you either change this config file to fit your BAM sample names or another solution would be to rename the BAM samplenames, e.g. with samtools, so it fits the fixed naming scheme in the mutect2.config file.
Thank you so much! changing mutect2.config worked :-)
Is this also a problem with Sarek v3.4.2?
Yes, I'm running nextflow run nf-core/sarek -r 3.4.2
Hi! YEs this is a known issue with mutect because it determines what the normal file is not by the file it is provided with but by this id parameter. I will make a note to update the docs to more clearly explain how to run mutect2 when files are not geenrated with sarek
There is a related issue #1455. I am closing this issue and keep tracking this feature there.