krakenuniq icon indicating copy to clipboard operation
krakenuniq copied to clipboard

Error fetching genomes

Open JCSzamosi opened this issue 5 years ago • 8 comments

When I'm using krakenuniq-download to download refseq genomes, every so often it will raise an error "Error fetching [ftp link]. Is curl installed?"

curl is definitely installed. That can't be the problem, since it is succeeding in fetching the vast majority of the genomes it tries to fetch. The ftp link it reports the error about always works when I try it in the browser, and it's not always the same number of links causing problems on subsequent attempts. Any idea how to troubleshoot this?

Thanks

JCSzamosi avatar Mar 27 '19 16:03 JCSzamosi

Hi just to check if there was any solution to this?

We are having the same issue when we use the command

krakenuniq-download --db DBDIR --threads 10 --dust refseq/bacteria refseq/archaea

It started failing after 7389/15418 and continues to give the abovementioned error.

Other krakenuniq-download commands went well.

thanks for looking into this

asrivathsan avatar Nov 13 '19 06:11 asrivathsan

Same here, anyone solve this issue? Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/664/025/GCF_009664025.1_ASM966402v1/GCF_009664025.1_ASM966402v1_genomic.fna.gz. Is curl installed?

christopher047 avatar Feb 28 '20 12:02 christopher047

I get the same error:

krakenuniq-download --threads 8 --dust --db bactarch.template refseq/bacteria refseq/archaea
Downloading assembly summary file for bacteria genomes, and filtering to assembly level Complete_Genome.
 Downloading bacteria genomes:  5272/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/680/025/GCF_001680025.1_ASM168002v1/GCF_001680025.1_ASM168002v1_genomic.fna.gz. Is curl installed?
 Downloading bacteria genomes:  8496/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/168/635/GCF_000168635.2_ASM16863v2/GCF_000168635.2_ASM16863v2_genomic.fna.gz. Is curl installed?
 Downloading bacteria genomes:  10759/16639 ...

The first 5k geomes download, then intermittently the Is curl installed? error occurs.

joshua-theisen avatar Mar 07 '20 16:03 joshua-theisen

I have the same issue. Does anyone know the cause and/or fix?

CuypersBart avatar Jun 22 '22 11:06 CuypersBart

This is caused by NCBI changing their ftp site setup, which they do frequently and which we can't control. However we are now putting out KrakenUniq/Kraken1 indices for download on Ben Langmead's index page here: https://benlangmead.github.io/aws-indexes/k2 We just put the "standard" database there, which will include files needed for Kraken 1, KrakenUniq, and Bracken, and we're going to put a larger database there too, which will add 100s of eukaryotic pathogens from EuPathDB. The standard database includes all RefSeq bacteria, archaea, viruses, and human.

salzberg avatar Jun 22 '22 12:06 salzberg

Thank you. That would be great!

I solved the problem for now by manually removing the genomes that throw an error, and using the --rsync flag on krakenuniq-download. After a few iterations, all genomes were downloaded correctly.

CuypersBart avatar Jun 24 '22 13:06 CuypersBart

@salzberg I noticed that, as of now, the Kraken2 databases at https://benlangmead.github.io/aws-indexes/k2 have been periodically updated, with the most recent one being from March 2023, while the latest version for KrakenUniq is from June 2022. I realize that the KrakenUniq databases are much larger and more difficult to create, but are there any plans for uploading an updated version of this as well?

amizeranschi avatar Apr 09 '23 16:04 amizeranschi

Yes, we do plan to update them, but they are huge, so this won't happen as often. We have several smaller ones, more specialized, so we might try to add those. Btw I don't have much funding for this, but I keep it going as best I can.

salzberg avatar Apr 09 '23 17:04 salzberg