SRAtools download seems to insert paired-end suffix into workdir path
Description of the bug
I am downloading data from SRA using the SRA-tools suite. Prefetch finished without an issue. However, fasterq-dump failed with the following error:
From the looks it seems like the _1, _2 suffix seems to be inserted into the work directory path. Looking at the code I did not find a reason for this unfortunately.
Command used and terminal output
nextflow run nf-core/fetchngs \
-profile cbe \
-c fetchngs.config \
-w /scratch/daniel.malzl/work \
-qs 25 \
--input "$@" \
--outdir data \
--force_sratools_download \
-resume
Relevant files
System information
Nextflow version 23.04.2
looking at the code and the actual command.sh file (see below) I actually think this is a fasterq-dump issue because the error message seems to be printed by fasterq-dump
#!/bin/bash -euo pipefail
export NCBI_SETTINGS="$PWD/user-settings.mkfg"
fasterq-dump \
--split-files --include-technical \
--threads 6 \
--outfile SRX10737613_SRR14385311 \
\
SRR14385311
pigz \
\
--no-name \
--processes 6 \
*.fastq
cat <<-END_VERSIONS > versions.yml
"NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_FASTERQDUMP":
sratools: $(fasterq-dump --version 2>&1 | grep -Eo '[0-9.]+')
pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
i opened an issue with them now. https://github.com/ncbi/sra-tools/issues/865#issue-1945238079
I am able to reproduce this locally when running nf-test:
nf-test test --verbose --tag sra_default_parameters --profile test,docker
As outlined in https://github.com/ncbi/sra-tools/issues/865#issuecomment-1878538830 fasterq-dump is unable to resolve paths that contain a . for writing the output files.
I got the same bug, (and my email , full of dots, is in my path...). If fixed this on my side for paired-ends reads with the following code for fetchngs/modules/nf-core/sratools/fasterqdump/main.nf
mkdir -p TEST.OUT TEST_1.OUT TEST_2.OUT
fasterq-dump \\
$args \\
--threads $task.cpus \\
--outfile TEST.OUT/$outfile \\
${key_file} \\
${sra}
mv -v "TEST_1.OUT/${outfile}" ./${outfile}.1.fastq
mv -v "TEST_2.OUT/${outfile}" ./${outfile}.2.fastq
pigz \\
$args2 \\
--no-name \\
--processes $task.cpus \\
*.fastq
(...)
This should be fixed in the next pipeline release by downgrading the version of sratools until it is fixed upstream. See https://github.com/nf-core/fetchngs/pull/261
I will leave this issue open until then as fixing upstream is the desired resolution. In the meantime, anyone wanting to use more recent support for ngc files can use sratools v3.0.8 via a custom config file passed to the pipeline:
process {
withName: 'SRATOOLS_FASTERQDUMP' {
container = 'quay.io/biocontainers/mulled-v2-5f89fe0cd045cb1d615630b9261a1d17943a9b6a:2f4a4c900edd6801ff0068c2b3048b4459d119eb-0'
}
}
This issue was fixed in SRA Toolkit 3.2.1 (https://github.com/ncbi/sra-tools/blob/master/CHANGES.md). I would be grateful if you could push this version of sratools.