kraken2
kraken2 copied to clipboard
problem with downloading databases
Hello all, just to mention that
downloading database for bacteria only do not work at the moment
"Step 1/2: Performing rsync file transfer of requested files rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) "
Downloading rdp 16S do not work also: went to RDP web site and it is not working. It is not also listed in google, has it closed?
Downloading archea works
Downloading silva 16s works also.
experiencing same problem here, failed under 'standard' build.
Step 1/2: Performing rsync file transfer of requested files
rsync: link_stat "/all/GCF/030/643/825/GCF_030643825.1_ASM3064382v1/GCF_030643825.1_ASM3064382v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [generator=3.2.3]
rsync_from_ncbi.pl: rsync error, exiting: 5888
RDP web do not exist anymore, therefore it is impossible to use it to fetch special database RDP for classifying 16S sequences.
Downloaded archea using and added Refseq bacteria (17 000 genomes, 21 Go compressed file)), manually, from NCBI new tool: NCBI Datasets (https://www.ncbi.nlm.nih.gov/datasets/).
Needs 121 Go of free ram to build the database, only have 59 free on my computer, so I am reducing it to 55 go using: kraken2-build --build --threads 8 --db ./database --max-db-size 55000000000
Database did build itself, took 1 hour but apparently did not use my bacteria genomes, only the archea. So I beleive it did not fin the fna. Maybe because the architecture of the folder, once it is decompressed from NCBI, is not correct?
I've downloaded and unzipped the 16/8 std dbs found below. Temporary solution.
https://benlangmead.github.io/aws-indexes/k2
I am also experiencing the same issue, as the following fail to synchronize.
rsync: link_stat "/all/GCF/000/012/405/GCF_000012405.1_ASM1240v1/GCF_000012405.1_ASM1240v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/033/372/575/GCF_033372575.1_ASM3337257v1/GCF_033372575.1_ASM3337257v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
As a result, the database (probably) does not build successfully, and when I attempt to run kraken2, I get the following error:
kraken2: database ("database") does not contain necessary file taxo.k2d
Perhaps NCBI has updated their repository (?), I was able to proceed w/o rsync errors today- bacteria genomes.
kraken2-build --download-library bacteria
The plasmid DB is not working: Kraken2 is using FTP mode even when you´re not requesting that option:
kraken2-build --download-library plasmid --no-masking --threads 8 --db contaminant_kraken2
@MixalisSn you need to run kraken2-build --download-taxonomy --db MYDB first
@AlexandreThibodeauUdM RDP is no longer being supported unfortunately.
For bacteria, this error results when NCBI is in the middle of updating their database files and the assembly_summary.txt has not been updated yet. It should work fine after a couple days.
@maxmaronna the plasmid download is different from the Refseq downloads. I'll check on the issue.
@jenniferlu717 @tdfy Indeed, after some days, the database was downloaded successfully. Thank you very much for your support and replies.