fetchngs icon indicating copy to clipboard operation
fetchngs copied to clipboard

Add support for prefetch argument `--max-size`

Open jfy133 opened this issue 3 years ago • 5 comments

Description of feature

I was trying to download some data, and apparently one of the files was 'too big' for the sra tools prefetch thingy.

Seems like the solution is given in the message. I will try specifying it with a custom modules.conf, but if it works I think it would be good to add inbuilt support :+1:

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)` terminated with an error exit status (3)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "44fc8155-3f0b-4ef8-a7c2-6d375100ae27"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi
  
  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR059917
  
  vdb-validate SRR059917
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  3

Command output:
  
  2021-12-13T11:41:44 prefetch.2.11.0: 1) 'SRR059917' (34GB) is larger than maximum allowed: skipped 
  
  Download of some files was skipped because they are too large
  You can change size download limit by setting
  --min-size and --max-size command line arguments

Command error:
  WARNING: While bind mounting '/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70:/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70': destination is already in the mount point list
  2021-12-13T11:41:44 prefetch.2.11.0 warn: Maximum file size download limit is 20GB 
  2021-12-13T11:41:44 vdb-validate.2.11.0 info: 'SRR059917' could not be found

Work dir:
  /mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

jfy133 avatar Dec 13 '21 12:12 jfy133

I was wondering while creating the module if there is ever a downside to not limiting the download size at all. I guess it could be somewhat unexpected to get a file that's close to 100 GB or so but then again the user chose the respective IDs... What do you think?

Midnighter avatar Feb 10 '22 10:02 Midnighter

Yeah I would agree there... you should know what you're downloading. But on the otherhand maybe that's not people check when fetchngs is making it 'so easy' to download stuff?

jfy133 avatar Feb 10 '22 12:02 jfy133

I would be okay with setting the default args to --max-size u then it can still be overwritten.

Midnighter avatar Feb 10 '22 13:02 Midnighter

I get the same error. All my fastq files above 40GB. Is there a quick fix? I tried adding --max-size to the nextflow command but I continue to get the same error.

nextflow run nf-core/fetchngs -c params.config --max-size 60G

royfrancis avatar Oct 06 '22 17:10 royfrancis

In your local config, you can set

process {
    withName: SRATOOLS_PREFETCH {
        ext.args = '--max-size 60g'
    }
}

Midnighter avatar Oct 07 '22 16:10 Midnighter

Looks like this is resolved so closing.

drpatelh avatar Nov 04 '22 10:11 drpatelh

Hi @drpatelh. On NF Tower, since I'm a launch user, I don't have the permissions to modify this attribute and so it's not practical if I want to modify this for a specific run. Would it be possible to expose this --max-size parameter in the GUI by default?

azedinez avatar Mar 20 '23 14:03 azedinez