Sarek parabricks gpu test_full: A USER ERROR has occurred: Bad input: Sample HCC1395_HCC1395N is not in BAM header: [sample]
Description of the bug
using test_full with gpu setting for Sarek, alignment works properly but mutect2 fails on the following error
[May 13, 2025 at 2:55:16 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1224736768
***********************************************************************
A USER ERROR has occurred: Bad input: Sample HCC1395_HCC1395N is not in BAM header: [sample]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
[sgrossfeld@ab-rnd-8-gpu 30171845a60313cec71b3ce4a90266]$ more .command.sh
#!/usr/bin/env bash -C -e -u -o pipefail
gatk --java-options "-Xmx29491M -XX:-UsePerfData" \
Mutect2 \
--input HCC1395N-1.cram --input HCC1395T-1.cram \
--output HCC1395T_vs_HCC1395N.mutect2.chr17_38877931-38878296.vcf.gz \
--reference Homo_sapiens_assembly38.fasta \
--panel-of-normals 1000g_pon.hg38.vcf.gz \
--germline-resource af-only-gnomad.hg38.vcf.gz \
--intervals chr17_38877931-38878296.bed \
--tmp-dir . \
--f1r2-tar-gz HCC1395T_vs_HCC1395N.mutect2.chr17_38877931-38878296.f1r2.tar.gz --normal-sample HCC1395_HCC1395N
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:MUTECT2_PAIRED":
gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS
[sgrossfeld@ab-rnd-8-gpu 30171845a60313cec71b3ce4a90266]$
Command used and terminal output
nextflow run nf-core/sarek -r dev -profile singularity,test_full,gpu --aligner parabricks --outdir gpu_run_test -resume
Relevant files
No response
System information
No response
Hey @sgrossfeld-arcusbio, I've tried to kick off a reproduction: https://cloud.seqera.io/orgs/nf-core/workspaces/AWSmegatests/watch/2KhCJEHXjcFcRn/v2/logs
When you get a chance, could you confirm that the parameters config all look correct to reproduce your issue?
Working on the module to include readgroups here: https://github.com/nf-core/modules/pull/8624
Hey @sgrossfeld-arcusbio could you try the fix I proposed in https://github.com/nf-core/sarek/pull/1925 and see if that works for you? I am running it now with
nextflow run famosab/sarek -r fix/fq2bam -profile singularity,test_full,gpu --aligner parabricks --outdir gpu_run_test -resume
nextflow_log_sarek_gpu_july_1.txt
Hi @famosab , thank you for working on this!
I ran your changes, interestingly it didn't hit Mutect2 but instead ran ascat and hit this error
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Loading required package: splines
Loading required package: data.table
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, aperm, append, as.data.frame, basename, cbind,
colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
table, tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:data.table’:
first, second
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Attaching package: ‘IRanges’
The following object is masked from ‘package:data.table’:
shift
Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Warning message:
package ‘ASCAT’ was built under R version 4.2.3
Error in readAlleleCountFiles(tumourAlleleCountsFile.prefix, ".txt", chrom_names, :
length(files) > 0 is not TRUE
Calls: ascat.prepareHTS ... ascat.getBAFsAndLogRs -> readAlleleCountFiles -> stopifnot
Execution halted
seems that its complaining about the allele count files not being there
trying to run with --tools mutect2
Hey @famosab
with --tools mutect2 your fix worked.. so seems like theres an unrelated error in the test_full still relating to Ascat
command
nextflow run famosab/sarek -r fix/fq2bam -profile singularity,test_full,gpu --aligner parabricks --outdir gpu_run_test --tools mutect2
@SPPearce Any ideas on the ASCAT error? Is that related to other errors that we see or does this need some kind of parabricks fix?
@SPPearce Any ideas on the ASCAT error? Is that related to other errors that we see or does this need some kind of parabricks fix?
Probably just an issue with ASCAT, not related to parabricks.