krakenuniq
krakenuniq copied to clipboard
Error fetching genomes
When I'm using krakenuniq-download
to download refseq genomes, every so often it will raise an error "Error fetching [ftp link]. Is curl installed?"
curl is definitely installed. That can't be the problem, since it is succeeding in fetching the vast majority of the genomes it tries to fetch. The ftp link it reports the error about always works when I try it in the browser, and it's not always the same number of links causing problems on subsequent attempts. Any idea how to troubleshoot this?
Thanks
Hi just to check if there was any solution to this?
We are having the same issue when we use the command
krakenuniq-download --db DBDIR --threads 10 --dust refseq/bacteria refseq/archaea
It started failing after 7389/15418 and continues to give the abovementioned error.
Other krakenuniq-download commands went well.
thanks for looking into this
Same here, anyone solve this issue? Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/664/025/GCF_009664025.1_ASM966402v1/GCF_009664025.1_ASM966402v1_genomic.fna.gz. Is curl installed?
I get the same error:
krakenuniq-download --threads 8 --dust --db bactarch.template refseq/bacteria refseq/archaea
Downloading assembly summary file for bacteria genomes, and filtering to assembly level Complete_Genome.
Downloading bacteria genomes: 5272/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/680/025/GCF_001680025.1_ASM168002v1/GCF_001680025.1_ASM168002v1_genomic.fna.gz. Is curl installed?
Downloading bacteria genomes: 8496/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/168/635/GCF_000168635.2_ASM16863v2/GCF_000168635.2_ASM16863v2_genomic.fna.gz. Is curl installed?
Downloading bacteria genomes: 10759/16639 ...
The first 5k geomes download, then intermittently the Is curl installed?
error occurs.
I have the same issue. Does anyone know the cause and/or fix?
This is caused by NCBI changing their ftp site setup, which they do frequently and which we can't control. However we are now putting out KrakenUniq/Kraken1 indices for download on Ben Langmead's index page here: https://benlangmead.github.io/aws-indexes/k2 We just put the "standard" database there, which will include files needed for Kraken 1, KrakenUniq, and Bracken, and we're going to put a larger database there too, which will add 100s of eukaryotic pathogens from EuPathDB. The standard database includes all RefSeq bacteria, archaea, viruses, and human.
Thank you. That would be great!
I solved the problem for now by manually removing the genomes that throw an error, and using the --rsync flag on krakenuniq-download. After a few iterations, all genomes were downloaded correctly.
@salzberg I noticed that, as of now, the Kraken2 databases at https://benlangmead.github.io/aws-indexes/k2 have been periodically updated, with the most recent one being from March 2023, while the latest version for KrakenUniq is from June 2022. I realize that the KrakenUniq databases are much larger and more difficult to create, but are there any plans for uploading an updated version of this as well?
Yes, we do plan to update them, but they are huge, so this won't happen as often. We have several smaller ones, more specialized, so we might try to add those. Btw I don't have much funding for this, but I keep it going as best I can.