sarek Issue: FASTQC Process Fails with Exit Code 140 in nf-core/sarek Pipeline Using Singularity

Description of the bug

FASTQC Process Fails with Exit Code 140 in nf-core/sarek Pipeline Using Singularity:

Description of the bug:

The nf-core/sarek pipeline is consistently failing during the FASTQC process with exit code 140 when executed on a Slurm-based HPC cluster using Singularity. The same issue occurs when using Docker as the container runtime.

Additional context and observations:

The error occurs during the FASTQC process in both Singularity and Docker executions.
Other pipelines such as nf-core/rnaseq run without issues in the same environment.
Running the pipeline with root privileges also fails.
The warning Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf appears but may not be directly related to the issue.
Java I/O error java.io.IOException: Bad file descriptor suggests possible file handling issues within the container.
The error persists even when Singularity is correctly configured and verified with other workflows.

Request for assistance:

I am seeking help to resolve this issue with the FASTQC process in the nf-core/sarek pipeline. Any guidance on addressing the exit code 140 error would be greatly appreciated, particularly:

Is this a known issue with FASTQC in nf-core/sarek?
Could the java.io.IOException: Bad file descriptor indicate an underlying issue in the pipeline or the environment?
Are there specific settings or configurations required for running this pipeline with Singularity on SLURM?

Posted as well at the NF-CORE Slack channel - https://nfcore.slack.com/archives/CE6SDBX2A/p1728540986179659

Added command and terminal output and relevant files below.

Command used and terminal output

Command used and terminal output:

Command Executed:

nextflow run nf-core/sarek -profile singularity --input samplesheet.csv --genome hg38 -r 3.2.3 -c nextflow.conf --outdir results_output --wes --known_indels Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tools mutect2,snpeff --resume

Error Output:

Failed Process: NFCORE_SAREK:SAREK:FASTQC (Sample-1)

Command Executed:

printf "%s %s\n" DNA_Sample-1.R1.fastq.gz Sample-1_1.gz DNA_Sample-1.R2.fastq.gz Sample-1_2.gz | while read old_name new_name; do
    [ -f "${new_name}" ] || ln -s $old_name $new_name
done
fastqc --quiet --threads 8 Sample-1_1.gz Sample-1_2.gz

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:FASTQC":
    fastqc: $( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS

Exit Code: 140

Command Error:
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
java.io.IOException: Bad file descriptor

Relevant files

Relevant files:

The following is the script used to launch the job (paths and personal information generalized for privacy):

#!/bin/bash
#SBATCH --job-name=CARIS_singularity # Job name
#SBATCH -p long
#SBATCH --mail-type=END,FAIL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=1                    # Run on a single CPU
#SBATCH --mem=10G                     # Job memory request
#SBATCH --cpus-per-task=1
#SBATCH --output=%x_%j_nobed.log   # Standard output and error log
#SBATCH --error=%x_%j_nobed.err

samples="samplesheet.csv"
sarekoutput="results_$SLURM_JOB_NAME"
logdir="/path/to/log/"
logfile="$SLURM_JOB_NAME.txt"
pon="/path/to/required/1000g_pon.hg38.vcf.gz"
pon_tbi="/path/to/required/1000g_pon.hg38.vcf.gz.tbi"
known_indels="/path/to/required/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz"
other=" --germline_resource /path/to/required/af-only-gnomad.raw.sites_mod.vcf.gz --germline_resource_tbi /path/to/required/af-only-gnomad.raw.sites_mod.vcf.gz.tbi --pon $pon --pon_tbi $pon_tbi"
maxmem="256.GB"
igenomes="/path/to/required/"
max_cpu="48"
max_time="600.h"
tools="mutect2,snpeff"

cmd="nextflow run nf-core/sarek -profile singularity --input $samples --genome hg38 -r 3.2.3 -c nextflow.conf --outdir $sarekoutput --wes --known_indels $known_indels --trim_fastq --resume --tools $tools $other"

# Create cache directory if it doesn't exist
if [ ! -d "cache" ]; then
    mkdir cache
fi

# Create nextflow config file
read -r -d '' config <<- EOM
params {
  config_profile_description = 'bioinfo config'
  config_profile_contact = '$SLURM_JOB_USER [email protected]'
}

singularity {
  enabled = true
  autoMounts = true
  cacheDir ='./cache/'
}

executor {
  name = 'slurm'
  queueSize = 12
}

process {
  executor = 'slurm'
  queue  = { task.time <= 5.h && task.memory <= 10.GB ? 'short': (task.memory <= 95.GB ? 'long' : 'highmem')}
  queueSize = 12
}

params {
  max_memory = '$maxmem'
  max_cpus = $max_cpu
  max_time = '$max_time'
}
EOM

echo "$config" > nextflow.conf

# Create log file
message=$(date +"%D %T")"        "$(whoami)"     "$SLURM_JOB_NAME"       "$cmd
echo  $message >> $logdir$logfile

# Execute nextflow command
nextflow run nf-core/sarek -profile singularity --input $samples --genome GATK.GRCh38 -r 3.2.3 -c nextflow.conf --outdir $sarekoutput --wes --known_indels $known_indels --trim_fastq --resume --tools mutect2,snpeff

System information

System information:

Nextflow version: 24.04.3
Hardware: HPC
Executor: Slurm
Container engine: Singularity 3.11.0, Docker 24.0.7
OS: Ubuntu 22.04 (Jammy Jellyfish)
Version of nf-core/sarek: 3.2.3

I have attached the stdout final message and the stderr. The output logs show the message described ealier at the screenshot, and as for the .err, I am having a "missing txt file" which I do not recognize.

The job ran for 24 hours, with a couple of failed jobs, it still produced an output of 2 TB at the work directories.

Script Location: The entire script and more details are available on GitHub at this issue link.

Oct 10 '24 06:10 SirAymane

Hey guys, I created a draft pull request so I could ask you about the best possible implementation of this feature. I've put my thoughts on it in the PR description in the Challenges section.

Oct 08 '24 14:10 kamilrzany

@Slamdunk Hey! 👋 We started using OpenSpout in our project. Being able to collapse columns (in XLSX files) is an important feature for us, and I think it is useful for other OpenSpout users as well.

We're currently using our fork of OpenSpout with this feature included. However, we'd prefer to get it merged, so it will be part of the official release and we don't need to keep using our own fork.

This current PR is not completely ready, because we were not sure what would be the best approach. My colleague @kamilrzany explained the challenges and some ideas in this PR's description.

Would you be able to take a look and let us know if you're open to merging this feature, and which approach would be preferred? We're happy to work a bit more on the PR, to make it nice according to the chosen approach.

I've sent a contribution through GitHub Sponsors to make up for your time 🙂

Dec 02 '24 14:12 jhogervorst

Hello, thank you for your sponsorship and the time you took to improve this library.

I researched the topic a bit more and I found that the <col> attribute in XLSX has been architectured with the false idea that all the attributes are related when instead they aren't.

This means for us that we can safely implement the properties unrelated one from each other, and only merge them once we need to write the <col> tag.

So the approach I'd take is the following:

Write different classes: ColumnWidth (already done), ColumnHidden, ColumndOutlineLevel and ColumnCollapsed
Expand the Sheet API to allow setting them separately
Merge them in the <col> attribute with distinct column ranges. This means that if the user asks for A+B to be hidden and B+C with OutlineLevel>1, we need 3 <col> tags (A, B and C), but instead for A+B hidden and A+B+C OutlineLevel>1 only two <col> are needed (A+B and C)
Completely skip the Options class: it has been a mistake by me to consider the ColumnWidth for the Workbook and it will be removed from the next major release. All the attributes belong only to the Worksheet

Dec 03 '24 08:12 Slamdunk

@Slamdunk Thanks so much for your investigation and clear reply! I'll review this with @kamilrzany and we'll schedule some time for refactoring this in our planning. We'll update this PR when it's ready.

Dec 03 '24 10:12 jhogervorst