fetchngs icon indicating copy to clipboard operation
fetchngs copied to clipboard

SRAtools download seems to insert paired-end suffix into workdir path

Open dmalzl opened this issue 2 years ago • 7 comments

Description of the bug

I am downloading data from SRA using the SRA-tools suite. Prefetch finished without an issue. However, fasterq-dump failed with the following error: image

From the looks it seems like the _1, _2 suffix seems to be inserted into the work directory path. Looking at the code I did not find a reason for this unfortunately.

Command used and terminal output

nextflow run nf-core/fetchngs \
        -profile cbe \
        -c fetchngs.config \
        -w /scratch/daniel.malzl/work \
        -qs 25 \
        --input "$@" \
        --outdir data \
        --force_sratools_download \
        -resume

Relevant files

nextflow.log

System information

Nextflow version 23.04.2

dmalzl avatar Oct 16 '23 12:10 dmalzl

looking at the code and the actual command.sh file (see below) I actually think this is a fasterq-dump issue because the error message seems to be printed by fasterq-dump

#!/bin/bash -euo pipefail
export NCBI_SETTINGS="$PWD/user-settings.mkfg"

fasterq-dump \
    --split-files --include-technical \
    --threads 6 \
    --outfile SRX10737613_SRR14385311 \
     \
    SRR14385311

pigz \
     \
    --no-name \
    --processes 6 \
    *.fastq

cat <<-END_VERSIONS > versions.yml
"NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_FASTERQDUMP":
    sratools: $(fasterq-dump --version 2>&1 | grep -Eo '[0-9.]+')
    pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS

dmalzl avatar Oct 16 '23 13:10 dmalzl

i opened an issue with them now. https://github.com/ncbi/sra-tools/issues/865#issue-1945238079

dmalzl avatar Oct 16 '23 13:10 dmalzl

I am able to reproduce this locally when running nf-test:

nf-test test --verbose --tag sra_default_parameters  --profile test,docker

drpatelh avatar Jan 05 '24 11:01 drpatelh

As outlined in https://github.com/ncbi/sra-tools/issues/865#issuecomment-1878538830 fasterq-dump is unable to resolve paths that contain a . for writing the output files.

drpatelh avatar Jan 05 '24 11:01 drpatelh

I got the same bug, (and my email , full of dots, is in my path...). If fixed this on my side for paired-ends reads with the following code for fetchngs/modules/nf-core/sratools/fasterqdump/main.nf

   mkdir -p TEST.OUT TEST_1.OUT TEST_2.OUT
    fasterq-dump \\
        $args \\
        --threads $task.cpus \\
        --outfile TEST.OUT/$outfile \\
        ${key_file} \\
        ${sra}

    mv -v "TEST_1.OUT/${outfile}"  ./${outfile}.1.fastq
    mv -v "TEST_2.OUT/${outfile}"  ./${outfile}.2.fastq

    pigz \\
        $args2 \\
        --no-name \\
        --processes $task.cpus \\
        *.fastq
(...)

lindenb avatar Jan 25 '24 14:01 lindenb

This should be fixed in the next pipeline release by downgrading the version of sratools until it is fixed upstream. See https://github.com/nf-core/fetchngs/pull/261

I will leave this issue open until then as fixing upstream is the desired resolution. In the meantime, anyone wanting to use more recent support for ngc files can use sratools v3.0.8 via a custom config file passed to the pipeline:

process {
    withName: 'SRATOOLS_FASTERQDUMP' {
        container = 'quay.io/biocontainers/mulled-v2-5f89fe0cd045cb1d615630b9261a1d17943a9b6a:2f4a4c900edd6801ff0068c2b3048b4459d119eb-0'
    }
}

drpatelh avatar Jan 30 '24 17:01 drpatelh

This issue was fixed in SRA Toolkit 3.2.1 (https://github.com/ncbi/sra-tools/blob/master/CHANGES.md). I would be grateful if you could push this version of sratools.

SaiH99 avatar Oct 01 '25 01:10 SaiH99