fetchngs icon indicating copy to clipboard operation
fetchngs copied to clipboard

wget host address error

Open mniederhuber opened this issue 1 year ago β€’ 9 comments

Description of the bug

I tried out the dev branch and am encountering a wget error in process NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP The underlying error is:

wget: unable to resolve host address 'ftp.sra.ebi.ac.uk'

Getting the error for a number of SRX experiment ids that have successfully downloaded with sra-tools in the past.

I'll try to see if I can figure out the issue, but figured I'd bring it up.

Command used and terminal output

#! /bin/bash
#SBATCH --mem=8G
#SBATCH -t 6:00:00
#SBATCH -p general
#SBATCH -o var/log/fetch-%j.out
#SBATCH -e var/log/fetch-%j.err

module load nextflow

nextflow -log var/log/.fetchngs run nf-core/fetchngs -r dev \
	-profile unc_longleaf \
	-params-file config/fetchngs_params.yaml

Relevant files

logfile.txt

System information

Nextflow 23.04.02 HPC slurm Singularity RHEL8 fetchngs dev

mniederhuber avatar Feb 23 '24 15:02 mniederhuber

Could be intermittent network or server issues. ENA/SRA do see a lot of traffic.

Midnighter avatar Feb 25 '24 17:02 Midnighter

I'm experiencing the same issue with wget using the dev branch, were you able to get this to work?

CJPerkins1 avatar Mar 11 '24 23:03 CJPerkins1

This is because of a problem with the Singularity container. A certain generation of containers was built with a Busybox that had a broken /etc/resolv.conf. I have reported this to the Galaxy folks who build the Singularity containers and will follow up once that is fixed.

pvanheus avatar Mar 16 '24 15:03 pvanheus

I think the problem is the container.

$ module load singularity-ce/4.1.0
$ singularity shell depot.galaxyproject.org-singularity-wget-1.20.1.img
WARNING: Skipping mount /var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
Singularity> wget -t 5 -nv -c -T 60 -O ERX2235404_ERR2179103_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR217/003/ERR2179103/ERR2179103_2.fastq.gz
wget: unable to resolve host address 'ftp.sra.ebi.ac.uk'

However, if I try the latest version of the container (check https://depot.galaxyproject.org/singularity/):

$ singularity pull https://depot.galaxyproject.org/singularity/wget:1.21.4
$ singularity shell wget\:1.21.4
Singularity> wget -t 5 -nv -c -T 60 -O ERX2235404_ERR2179103_2.fastq.gz ftp.sra.ebi.ac.uk/vol1/fastq/ERR217/003/ERR2179103/ERR2179103_2.fastq.gz
Singularity> ls ERX2235404_ERR2179103_2.fastq.gz
ERX2235404_ERR2179103_2.fastq.gz

So, I guess the solution is to instruct Nextflow to fetch the latest image in modules/local/sra_fastq_ftp/main.nf:

    conda "conda-forge::wget=1.20.1"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/wget:1.20.1' :
        'biocontainers/wget:1.20.1' }"

change to (conda also for consistency but I haven't test):

    conda "conda-forge::wget=1.21.4"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/wget:1.21.4' :
        'biocontainers/wget:1.21.4' }"

josemunozc avatar May 08 '24 12:05 josemunozc

sorry, just realized that the suggested change already made it to the dev branch :p

josemunozc avatar May 08 '24 12:05 josemunozc

I can confirm, updating the wget container to v.1.21.4 (with fe2756912803b988a3407586c7264578b0c147f2) fixed this issue.

maxibor avatar Oct 07 '24 13:10 maxibor

Hello! I am still getting this error even with the fix suggested above. The line before the wget error has a warning regarding my singularity. Could this be part of the problem?

Command error: WARNING: Skipping mount /usr/local/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container wget: unable to resolve host address 'ftp.sra.ebi.ac.uk'

Thanks!

LKeene avatar Oct 09 '24 20:10 LKeene

Dear @LKeene I was able to get successful results when conda is used. For example, I was using -profile conda

nextflow run nf-core/fetchngs -r 1.12.0 -profile conda --input ids.csv --outdir results_naga_test -c ibex.config

IBEXCluster avatar Oct 13 '24 19:10 IBEXCluster

I was able to get successful results when conda is used.

A bummer that users are currently forced to use conda instead of Singularity, at least for now.

nick-youngblut avatar Oct 16 '24 17:10 nick-youngblut

Hello everyone! πŸ‘‹

As I mentioned in https://github.com/nf-core/fetchngs/issues/328, I hit the same wget: unable to resolve host address 'ftp.sra.ebi.ac.uk' failure on v1.12.0 and worked around it locally by applying the changes from PR #338 following @JulianFlesch suggestion (Thanks, Julian! πŸ’™ ).

Initial fix (DNS resolution):

  • Run nextflow pull nf-core/fetchngs
  • Inside .nextflow/assets/nf-core/fetchngs/modules/local/sra_fastq_ftp/main.nf:
    • Bump conda + container definitions to wget=1.21.4
    • Prefix the FASTQ URLs with ftp:// so wget sees a complete URL

Since GitHub doesn't support attaching file type .nf, I include the updated content of .nextflow/assets/nf-core/fetchngs/modules/local/sra_fastq_ftp/main.nf at the end of this message.

Additional tweaks for large batches (server throttling):

When downloading many files (e.g., 400+ IDs), ENA's FTP servers can throttle concurrent connections, causing "Error in server response. Closing." messages. To reduce these, I created a custom.config file that:

  • Limits concurrent downloads (maxForks = 6) to avoid overwhelming the server
  • Increases wget retries and timeouts (-t 10 -T 120 --waitretry=30 --retry-connrefused) for more resilience
  • Allows more Nextflow-level retries (maxRetries = 4) for processes that fail after wget's internal retries

I then run the pipeline with: nextflow run nf-core/fetchngs ... -c custom.config -resume (e.g., nextflow run nf-core/fetchngs -r 1.12.0 -profile singularity --input ids.csv --outdir data/raw -c custom.config -resume). This successfully completed downloading 800+ files from my full dataset. The custom.config content is also included below.

Note: I only tested with Singularity. Hopefully, this also fixes the issue in other configuration profiles (e.g., Docker).

Until 1.13.0 lands, this manual patch seems stable. Hope it helps! 🀞


Note: pulling a new pipeline release will revert the edits living under .nextflow/assetsβ€”just reapply them if needed.

Updated main.nf file:


process SRA_FASTQ_FTP {
    tag "$meta.id"
    label 'process_low'
    label 'error_retry'

    conda "conda-forge::wget=1.21.4"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/wget:1.21.4' :
        'biocontainers/wget:1.21.4' }"


    input:
    tuple val(meta), val(fastq)

    output:
    tuple val(meta), path("*fastq.gz"), emit: fastq
    tuple val(meta), path("*md5")     , emit: md5
    path "versions.yml"               , emit: versions

    script:
    def args = task.ext.args ?: ''
    // Ensure URLs have ftp:// protocol prefix
    def fastq0 = fastq[0].startsWith('ftp://') || fastq[0].startsWith('http://') || fastq[0].startsWith('https://') ? fastq[0] : "ftp://${fastq[0]}"
    def fastq1 = fastq.size() > 1 ? (fastq[1].startsWith('ftp://') || fastq[1].startsWith('http://') || fastq[1].startsWith('https://') ? fastq[1] : "ftp://${fastq[1]}") : ''
    if (meta.single_end) {
        """
        wget \\
            $args \\
            -O ${meta.id}.fastq.gz \\
            ${fastq0}

        echo "${meta.md5_1}  ${meta.id}.fastq.gz" > ${meta.id}.fastq.gz.md5
        md5sum -c ${meta.id}.fastq.gz.md5

        cat <<-END_VERSIONS > versions.yml
        "${task.process}":
            wget: \$(echo \$(wget --version | head -n 1 | sed 's/^GNU Wget //; s/ .*\$//'))
        END_VERSIONS
        """
    } else {
        """
        wget \\
            $args \\
            -O ${meta.id}_1.fastq.gz \\
            ${fastq0}

        echo "${meta.md5_1}  ${meta.id}_1.fastq.gz" > ${meta.id}_1.fastq.gz.md5
        md5sum -c ${meta.id}_1.fastq.gz.md5

        wget \\
            $args \\
            -O ${meta.id}_2.fastq.gz \\
            ${fastq1}

        echo "${meta.md5_2}  ${meta.id}_2.fastq.gz" > ${meta.id}_2.fastq.gz.md5
        md5sum -c ${meta.id}_2.fastq.gz.md5

        cat <<-END_VERSIONS > versions.yml
        "${task.process}":
            wget: \$(echo \$(wget --version | head -n 1 | sed 's/^GNU Wget //; s/ .*\$//'))
        END_VERSIONS
        """
    }
}

custom.config file (for large batches):

/*
 * custom.config
 * Use with: nextflow run nf-core/fetchngs ... -c custom.config -resume
 */

process {
    // Let Nextflow retry failing processes up to 4 times instead of 2
    withLabel: error_retry {
        errorStrategy = 'retry'
        maxRetries    = 4
    }

    // Tweak the SRA_FASTQ_FTP step (wget downloads)
    withName: 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP' {
        // More forgiving wget flags
        ext.args = '-t 10 -nv -c -T 120 --waitretry=30 --retry-connrefused'
    }
}

fmerinocasallo avatar Nov 21 '25 06:11 fmerinocasallo