sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Sarek parabricks gpu test_full: A USER ERROR has occurred: Bad input: Sample HCC1395_HCC1395N is not in BAM header: [sample]

Open sgrossfeld-arcusbio opened this issue 7 months ago • 3 comments

Description of the bug

using test_full with gpu setting for Sarek, alignment works properly but mutect2 fails on the following error



[May 13, 2025 at 2:55:16 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1224736768
***********************************************************************

A USER ERROR has occurred: Bad input: Sample HCC1395_HCC1395N is not in BAM header: [sample]

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
[sgrossfeld@ab-rnd-8-gpu 30171845a60313cec71b3ce4a90266]$ more .command.sh
#!/usr/bin/env bash -C -e -u -o pipefail
gatk --java-options "-Xmx29491M -XX:-UsePerfData" \
    Mutect2 \
    --input HCC1395N-1.cram --input HCC1395T-1.cram \
    --output HCC1395T_vs_HCC1395N.mutect2.chr17_38877931-38878296.vcf.gz \
    --reference Homo_sapiens_assembly38.fasta \
    --panel-of-normals 1000g_pon.hg38.vcf.gz \
    --germline-resource af-only-gnomad.hg38.vcf.gz \
    --intervals chr17_38877931-38878296.bed \
    --tmp-dir . \
    --f1r2-tar-gz HCC1395T_vs_HCC1395N.mutect2.chr17_38877931-38878296.f1r2.tar.gz --normal-sample HCC1395_HCC1395N

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:MUTECT2_PAIRED":
    gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS
[sgrossfeld@ab-rnd-8-gpu 30171845a60313cec71b3ce4a90266]$

Command used and terminal output

nextflow run nf-core/sarek -r dev -profile singularity,test_full,gpu --aligner parabricks --outdir gpu_run_test -resume

Relevant files

No response

System information

No response

sgrossfeld-arcusbio avatar May 29 '25 22:05 sgrossfeld-arcusbio

Hey @sgrossfeld-arcusbio, I've tried to kick off a reproduction: https://cloud.seqera.io/orgs/nf-core/workspaces/AWSmegatests/watch/2KhCJEHXjcFcRn/v2/logs

When you get a chance, could you confirm that the parameters config all look correct to reproduce your issue?

edmundmiller avatar Jun 06 '25 14:06 edmundmiller

Working on the module to include readgroups here: https://github.com/nf-core/modules/pull/8624

famosab avatar Jun 16 '25 12:06 famosab

Hey @sgrossfeld-arcusbio could you try the fix I proposed in https://github.com/nf-core/sarek/pull/1925 and see if that works for you? I am running it now with

nextflow run famosab/sarek -r fix/fq2bam -profile singularity,test_full,gpu --aligner parabricks  --outdir gpu_run_test -resume

famosab avatar Jun 20 '25 10:06 famosab

nextflow_log_sarek_gpu_july_1.txt

Hi @famosab , thank you for working on this!

I ran your changes, interestingly it didn't hit Mutect2 but instead ran ascat and hit this error

INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Loading required package: splines
Loading required package: data.table
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:data.table’:

    first, second

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:data.table’:

    shift

Loading required package: GenomeInfoDb
Loading required package: parallel
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Warning message:
package ‘ASCAT’ was built under R version 4.2.3 
Error in readAlleleCountFiles(tumourAlleleCountsFile.prefix, ".txt", chrom_names,  : 
  length(files) > 0 is not TRUE
Calls: ascat.prepareHTS ... ascat.getBAFsAndLogRs -> readAlleleCountFiles -> stopifnot
Execution halted

seems that its complaining about the allele count files not being there

trying to run with --tools mutect2

sgrossfeld-arcusbio avatar Jul 01 '25 21:07 sgrossfeld-arcusbio

Hey @famosab

with --tools mutect2 your fix worked.. so seems like theres an unrelated error in the test_full still relating to Ascat

command

nextflow run famosab/sarek -r fix/fq2bam -profile singularity,test_full,gpu --aligner parabricks --outdir gpu_run_test --tools mutect2

Image

sgrossfeld-arcusbio avatar Jul 02 '25 16:07 sgrossfeld-arcusbio

@SPPearce Any ideas on the ASCAT error? Is that related to other errors that we see or does this need some kind of parabricks fix?

famosab avatar Jul 14 '25 08:07 famosab

@SPPearce Any ideas on the ASCAT error? Is that related to other errors that we see or does this need some kind of parabricks fix?

Probably just an issue with ASCAT, not related to parabricks.

SPPearce avatar Jul 14 '25 09:07 SPPearce