Bracken icon indicating copy to clipboard operation
Bracken copied to clipboard

bracken-build "database library does not exist" for custom database

Open adec37 opened this issue 2 years ago • 8 comments

I used the GTDB database (https://gtdb.ecogenomic.org/) for kraken2 which consist of 'hash.k2d' 'opts.k2d' 'seqid2taxid.map' and 'taxo.k2d' files. I am now trying to run bracken-build as follows:

/users/a/d/adecola/anaconda3/envs/bracken/bracken-build -d /users/a/d/adecola/anaconda3/envs/kraken2/GTDB/ -k 35 -l 2000 -x /users/a/d/adecola/anaconda3/bin/ I am getting the following output:

Selected Options: kmer length = 35 read length = 2000 database = /users/a/d/adecola/anaconda3/envs/kraken2/GTDB/ threads = 1 Checking for Valid Options... ERROR: Database library /users/a/d/adecola/anaconda3/envs/kraken2/GTDB/library does not exist

I do not have the 'library' folder that is generated by the kraken2 standard database as I downloaded an external database. Any advice is much appreciated.

adec37 avatar Aug 08 '22 16:08 adec37

Yep, same error here - downloaded Kraken2 database from the official page. There are pre-calculated Bracken files in there, but no "library" folder. I don't think I've ever used a Kraken2 DB with a library folder?

apredeus avatar Aug 10 '22 01:08 apredeus

You cannot make the database files without the library/ folder, so whoever built the original Kraken database will need to make the pre-calculated Bracken files.

jenniferlu717 avatar Oct 11 '22 18:10 jenniferlu717

I am also still having issues Bracken and not having the Database Library. I downloaded the PlusPF database from (https://benlangmead.github.io/aws-indexes/k2) and was able to run Kraken2 and make Krona plots with no real trouble. However, I'm stuck at trying to use bracken. I have the correct path to my database and kraken2 program and the other options are the defaults. Additionally was able to build a taxonomy subdirectory using the kraken2-build --build-taxonomy . Is there a place to download the "library" or a script to generate them?

jtsteyer93 avatar May 01 '23 20:05 jtsteyer93

Facing the exact issue as @jtsteyer93 except I have the Standard-8 db.

nityendra21 avatar May 12 '23 06:05 nityendra21

I am having the same issue as @jtsteyer93, has anyone been able to solve it?

merytouceda avatar May 22 '23 21:05 merytouceda

It's solved. To build the bracken database ref file with bracken-build, you need the library/ and taxonomy/ folders from the kraken-build commands. The taxonomy is easy to reproduce, however the library/ folder is I think a bunch of kmer hash files that are specific to your database and cannot be reproduced without the reference genomes used to build the kraken database (i.e. those used to produce the 'hash.k2d' 'opts.k2d' 'seqid2taxid.map' and 'taxo.k2d' files). See lines 174-186 in bracken-build script (https://github.com/jenniferlu717/Bracken/blob/master/bracken-build). You can only solve this issue by re-building the kraken database. It's not too bad to get a standard Kraken database, but does require some runtime/computational power. Refer to the kraken manual here for how to do this: https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown.

The workflow is described in the Bracken documentation: build the kraken database, build the ref files for bracken, then classify with kraken, run bracken. Warning: do not run kraken-build --clean before building all the required bracken files as this will remove the contents of /library.

JuliaMcGonigle avatar Jul 18 '23 16:07 JuliaMcGonigle

Yes, I have done it that way before the databases were released and was able to get bracken to work then. It seems counterproductive to have released databases that they made based on their parameters but then does not work for their pipeline. That's why I am confused about doing the analysis with this database vs what you are suggesting.

While downloading and building the databases through kraken itself is possible (thought I have consistent issues with standard database download and rsync/use-ftp options), it doesn't seem like the most effective way to do it.

jtsteyer93 avatar Jul 18 '23 17:07 jtsteyer93

Yeah I would agree that the bracken build files should have just been made and released with the kraken database k2d files. I don't think there is a way to get them from just the k2d files after digging around in the bracken-build script. Seems odd they weren't released together when these programs rely on each other and improve results. Re-building is sadly the only solution to this issue.

I was just throwing it out as a solution for folks who seem to still be having this problem if they still want to use bracken for their analysis, as I just recently also came across it. I used rsync to get my genomes from NCBI RefSeq/GenBank rather than the standard db download through kraken-build and that worked ok for me with minor issues.

JuliaMcGonigle avatar Jul 18 '23 18:07 JuliaMcGonigle