fetchngs
fetchngs copied to clipboard
Add support for prefetch argument `--max-size`
Description of feature
I was trying to download some data, and apparently one of the files was 'too big' for the sra tools prefetch thingy.
Seems like the solution is given in the message. I will try specifying it with a custom modules.conf
, but if it works I think it would be good to add inbuilt support :+1:
Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)'
Caused by:
Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)` terminated with an error exit status (3)
Command executed:
eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
if [[ ! -f "${NCBI_SETTINGS}" ]]; then
mkdir -p "$(dirname "${NCBI_SETTINGS}")"
printf '/LIBS/GUID = "44fc8155-3f0b-4ef8-a7c2-6d375100ae27"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
fi
retry_with_backoff.sh prefetch \
\
--progress \
SRR059917
vdb-validate SRR059917
cat <<-END_VERSIONS > versions.yml
"NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
END_VERSIONS
Command exit status:
3
Command output:
2021-12-13T11:41:44 prefetch.2.11.0: 1) 'SRR059917' (34GB) is larger than maximum allowed: skipped
Download of some files was skipped because they are too large
You can change size download limit by setting
--min-size and --max-size command line arguments
Command error:
WARNING: While bind mounting '/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70:/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70': destination is already in the mount point list
2021-12-13T11:41:44 prefetch.2.11.0 warn: Maximum file size download limit is 20GB
2021-12-13T11:41:44 vdb-validate.2.11.0 info: 'SRR059917' could not be found
Work dir:
/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
I was wondering while creating the module if there is ever a downside to not limiting the download size at all. I guess it could be somewhat unexpected to get a file that's close to 100 GB or so but then again the user chose the respective IDs... What do you think?
Yeah I would agree there... you should know what you're downloading. But on the otherhand maybe that's not people check when fetchngs is making it 'so easy' to download stuff?
I would be okay with setting the default args
to --max-size u
then it can still be overwritten.
I get the same error. All my fastq files above 40GB. Is there a quick fix? I tried adding --max-size
to the nextflow command but I continue to get the same error.
nextflow run nf-core/fetchngs -c params.config --max-size 60G
In your local config, you can set
process {
withName: SRATOOLS_PREFETCH {
ext.args = '--max-size 60g'
}
}
Looks like this is resolved so closing.
Hi @drpatelh. On NF Tower, since I'm a launch user, I don't have the permissions to modify this attribute and so it's not practical if I want to modify this for a specific run. Would it be possible to expose this --max-size
parameter in the GUI by default?