Issue: FASTQC Process Fails with Exit Code 140 in nf-core/sarek Pipeline Using Singularity
Description of the bug
FASTQC Process Fails with Exit Code 140 in nf-core/sarek Pipeline Using Singularity:
Description of the bug:
The nf-core/sarek pipeline is consistently failing during the FASTQC process with exit code 140 when executed on a Slurm-based HPC cluster using Singularity. The same issue occurs when using Docker as the container runtime.
Additional context and observations:
- The error occurs during the
FASTQCprocess in both Singularity and Docker executions. - Other pipelines such as
nf-core/rnaseqrun without issues in the same environment. - Running the pipeline with root privileges also fails.
- The warning
Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.confappears but may not be directly related to the issue. - Java I/O error
java.io.IOException: Bad file descriptorsuggests possible file handling issues within the container. - The error persists even when Singularity is correctly configured and verified with other workflows.
Request for assistance:
I am seeking help to resolve this issue with the FASTQC process in the nf-core/sarek pipeline. Any guidance on addressing the exit code 140 error would be greatly appreciated, particularly:
- Is this a known issue with FASTQC in nf-core/sarek?
- Could the
java.io.IOException: Bad file descriptorindicate an underlying issue in the pipeline or the environment? - Are there specific settings or configurations required for running this pipeline with Singularity on SLURM?
Posted as well at the NF-CORE Slack channel - https://nfcore.slack.com/archives/CE6SDBX2A/p1728540986179659
Added command and terminal output and relevant files below.
Command used and terminal output
Command used and terminal output:
Command Executed:
nextflow run nf-core/sarek -profile singularity --input samplesheet.csv --genome hg38 -r 3.2.3 -c nextflow.conf --outdir results_output --wes --known_indels Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --tools mutect2,snpeff --resume
Error Output:
Failed Process: NFCORE_SAREK:SAREK:FASTQC (Sample-1)
Command Executed:
printf "%s %s\n" DNA_Sample-1.R1.fastq.gz Sample-1_1.gz DNA_Sample-1.R2.fastq.gz Sample-1_2.gz | while read old_name new_name; do
[ -f "${new_name}" ] || ln -s $old_name $new_name
done
fastqc --quiet --threads 8 Sample-1_1.gz Sample-1_2.gz
cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:FASTQC":
fastqc: $( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
Exit Code: 140
Command Error:
WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
java.io.IOException: Bad file descriptor
Relevant files
Relevant files:
The following is the script used to launch the job (paths and personal information generalized for privacy):
#!/bin/bash
#SBATCH --job-name=CARIS_singularity # Job name
#SBATCH -p long
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --ntasks=1 # Run on a single CPU
#SBATCH --mem=10G # Job memory request
#SBATCH --cpus-per-task=1
#SBATCH --output=%x_%j_nobed.log # Standard output and error log
#SBATCH --error=%x_%j_nobed.err
samples="samplesheet.csv"
sarekoutput="results_$SLURM_JOB_NAME"
logdir="/path/to/log/"
logfile="$SLURM_JOB_NAME.txt"
pon="/path/to/required/1000g_pon.hg38.vcf.gz"
pon_tbi="/path/to/required/1000g_pon.hg38.vcf.gz.tbi"
known_indels="/path/to/required/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz"
other=" --germline_resource /path/to/required/af-only-gnomad.raw.sites_mod.vcf.gz --germline_resource_tbi /path/to/required/af-only-gnomad.raw.sites_mod.vcf.gz.tbi --pon $pon --pon_tbi $pon_tbi"
maxmem="256.GB"
igenomes="/path/to/required/"
max_cpu="48"
max_time="600.h"
tools="mutect2,snpeff"
cmd="nextflow run nf-core/sarek -profile singularity --input $samples --genome hg38 -r 3.2.3 -c nextflow.conf --outdir $sarekoutput --wes --known_indels $known_indels --trim_fastq --resume --tools $tools $other"
# Create cache directory if it doesn't exist
if [ ! -d "cache" ]; then
mkdir cache
fi
# Create nextflow config file
read -r -d '' config <<- EOM
params {
config_profile_description = 'bioinfo config'
config_profile_contact = '$SLURM_JOB_USER [email protected]'
}
singularity {
enabled = true
autoMounts = true
cacheDir ='./cache/'
}
executor {
name = 'slurm'
queueSize = 12
}
process {
executor = 'slurm'
queue = { task.time <= 5.h && task.memory <= 10.GB ? 'short': (task.memory <= 95.GB ? 'long' : 'highmem')}
queueSize = 12
}
params {
max_memory = '$maxmem'
max_cpus = $max_cpu
max_time = '$max_time'
}
EOM
echo "$config" > nextflow.conf
# Create log file
message=$(date +"%D %T")" "$(whoami)" "$SLURM_JOB_NAME" "$cmd
echo $message >> $logdir$logfile
# Execute nextflow command
nextflow run nf-core/sarek -profile singularity --input $samples --genome GATK.GRCh38 -r 3.2.3 -c nextflow.conf --outdir $sarekoutput --wes --known_indels $known_indels --trim_fastq --resume --tools mutect2,snpeff
System information
System information:
- Nextflow version:
24.04.3 - Hardware: HPC
- Executor: Slurm
- Container engine: Singularity
3.11.0, Docker 24.0.7 - OS: Ubuntu 22.04 (Jammy Jellyfish)
- Version of nf-core/sarek:
3.2.3
I have attached the stdout final message and the stderr. The output logs show the message described ealier at the screenshot, and as for the .err, I am having a "missing txt file" which I do not recognize.
The job ran for 24 hours, with a couple of failed jobs, it still produced an output of 2 TB at the work directories.
Script Location: The entire script and more details are available on GitHub at this issue link.
Hey guys, I created a draft pull request so I could ask you about the best possible implementation of this feature. I've put my thoughts on it in the PR description in the Challenges section.
@Slamdunk Hey! 👋 We started using OpenSpout in our project. Being able to collapse columns (in XLSX files) is an important feature for us, and I think it is useful for other OpenSpout users as well.
We're currently using our fork of OpenSpout with this feature included. However, we'd prefer to get it merged, so it will be part of the official release and we don't need to keep using our own fork.
This current PR is not completely ready, because we were not sure what would be the best approach. My colleague @kamilrzany explained the challenges and some ideas in this PR's description.
Would you be able to take a look and let us know if you're open to merging this feature, and which approach would be preferred? We're happy to work a bit more on the PR, to make it nice according to the chosen approach.
I've sent a contribution through GitHub Sponsors to make up for your time 🙂
Hello, thank you for your sponsorship and the time you took to improve this library.
I researched the topic a bit more and I found that the <col> attribute in XLSX has been architectured with the false idea that all the attributes are related when instead they aren't.
This means for us that we can safely implement the properties unrelated one from each other, and only merge them once we need to write the <col> tag.
So the approach I'd take is the following:
- Write different classes:
ColumnWidth(already done),ColumnHidden,ColumndOutlineLevelandColumnCollapsed - Expand the
SheetAPI to allow setting them separately - Merge them in the
<col>attribute with distinct column ranges. This means that if the user asks for A+B to be hidden and B+C withOutlineLevel>1, we need 3<col>tags (A, B and C), but instead for A+B hidden and A+B+COutlineLevel>1only two<col>are needed (A+B and C) - Completely skip the
Optionsclass: it has been a mistake by me to consider theColumnWidthfor theWorkbookand it will be removed from the next major release. All the attributes belong only to theWorksheet
@Slamdunk Thanks so much for your investigation and clear reply! I'll review this with @kamilrzany and we'll schedule some time for refactoring this in our planning. We'll update this PR when it's ready.