kraken2 icon indicating copy to clipboard operation
kraken2 copied to clipboard

problem with downloading databases

Open AlexandreThibodeauUdM opened this issue 1 year ago • 11 comments

Hello all, just to mention that

downloading database for bacteria only do not work at the moment

"Step 1/2: Performing rsync file transfer of requested files rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) "

Downloading rdp 16S do not work also: went to RDP web site and it is not working. It is not also listed in google, has it closed?

Downloading archea works

Downloading silva 16s works also.

AlexandreThibodeauUdM avatar Nov 10 '23 15:11 AlexandreThibodeauUdM

experiencing same problem here, failed under 'standard' build.

Step 1/2: Performing rsync file transfer of requested files
rsync: link_stat "/all/GCF/030/643/825/GCF_030643825.1_ASM3064382v1/GCF_030643825.1_ASM3064382v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [generator=3.2.3]
rsync_from_ncbi.pl: rsync error, exiting: 5888

tdfy avatar Nov 10 '23 16:11 tdfy

RDP web do not exist anymore, therefore it is impossible to use it to fetch special database RDP for classifying 16S sequences.

AlexandreThibodeauUdM avatar Nov 14 '23 15:11 AlexandreThibodeauUdM

Downloaded archea using and added Refseq bacteria (17 000 genomes, 21 Go compressed file)), manually, from NCBI new tool: NCBI Datasets (https://www.ncbi.nlm.nih.gov/datasets/).

Needs 121 Go of free ram to build the database, only have 59 free on my computer, so I am reducing it to 55 go using: kraken2-build --build --threads 8 --db ./database --max-db-size 55000000000

AlexandreThibodeauUdM avatar Nov 15 '23 20:11 AlexandreThibodeauUdM

Database did build itself, took 1 hour but apparently did not use my bacteria genomes, only the archea. So I beleive it did not fin the fna. Maybe because the architecture of the folder, once it is decompressed from NCBI, is not correct?

AlexandreThibodeauUdM avatar Nov 16 '23 15:11 AlexandreThibodeauUdM

I've downloaded and unzipped the 16/8 std dbs found below. Temporary solution.

https://benlangmead.github.io/aws-indexes/k2

tdfy avatar Nov 16 '23 16:11 tdfy

I am also experiencing the same issue, as the following fail to synchronize.

rsync: link_stat "/all/GCF/000/012/405/GCF_000012405.1_ASM1240v1/GCF_000012405.1_ASM1240v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/033/372/575/GCF_033372575.1_ASM3337257v1/GCF_033372575.1_ASM3337257v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)

As a result, the database (probably) does not build successfully, and when I attempt to run kraken2, I get the following error: kraken2: database ("database") does not contain necessary file taxo.k2d

MixalisSn avatar Nov 16 '23 22:11 MixalisSn

Perhaps NCBI has updated their repository (?), I was able to proceed w/o rsync errors today- bacteria genomes.

kraken2-build --download-library bacteria

tdfy avatar Nov 20 '23 20:11 tdfy

The plasmid DB is not working: Kraken2 is using FTP mode even when you´re not requesting that option:

kraken2-build --download-library plasmid --no-masking --threads 8 --db contaminant_kraken2

maxmaronna avatar Nov 25 '23 18:11 maxmaronna

@MixalisSn you need to run kraken2-build --download-taxonomy --db MYDB first

jenniferlu717 avatar Dec 08 '23 19:12 jenniferlu717

@AlexandreThibodeauUdM RDP is no longer being supported unfortunately.

For bacteria, this error results when NCBI is in the middle of updating their database files and the assembly_summary.txt has not been updated yet. It should work fine after a couple days.

@maxmaronna the plasmid download is different from the Refseq downloads. I'll check on the issue.

jenniferlu717 avatar Dec 08 '23 19:12 jenniferlu717

@jenniferlu717 @tdfy Indeed, after some days, the database was downloaded successfully. Thank you very much for your support and replies.

MixalisSn avatar Dec 10 '23 18:12 MixalisSn