tools-iuc icon indicating copy to clipboard operation
tools-iuc copied to clipboard

Bracken Data Manager: Add support for pre-built DBs

Open dfornika opened this issue 4 years ago • 3 comments
trafficstars

Issue #4073 describes the need to update the links that the Kraken2 data manager uses to pull the pre-built 'MiniKraken2' database. The 'canonical' place to find that database is now the 'Index Zone' created by Ben Langmead.

In Pull-Request #4139, I've updated the MiniKraken2 URL and also added support for downloading the three 'Standard' Kraken2 databases available on the Index Zone.

All of the Kraken2 downloads from the Index Zone also include pre-built Bracken .kmer_distrib files for a set of read lengths. If the pull-request above is merged, then support for pre-built bracken databases shoule be added to the Bracken data manager.

dfornika avatar Nov 04 '21 23:11 dfornika

Hi @dfornika and @bernt-matthias GA is working on setting up the pre-built index DB for the bracken tool. We've found that the indices for the Bracken come with different kmer size (i.e, 50,75,100 and etc). However, the index for bracken tool on GE only has a single entry (i.e Prebuild RefSeq indexes:PlusPFP (version:2022-06-07, K-mer:35, Read:100) available for each DB (i.e standard, pluspfp and etc) . What I understand is the DB was built using kmer 35 which I can find inside the inspect.txt file inside the directory. I am confused with the Read:100. does it mean the databse 100kmer is used ? many thanks.

mthang avatar Jan 19 '24 02:01 mthang

Bracken needs to know the (approximate) size of the reads. I don't understand the technical reason for it, but the Bracken db/index is built for a specific read length.

If you download one of the Kraken2 databases from Ben Langmead's 'index zone' site, it come along with a set of Bracken indexes for specific read lengths.

dfornika avatar Jan 19 '24 02:01 dfornika

Thank you for your speedy response ! We downloaded both of Kracken2 DB which already comes with the Bracken DB/index for 6/7/2022. The following bracken indices are found inside the folder. I wonder which one to use when setting up the bracken db for bracken tool on Galaxy inside the .loc file.

  • database100mers.kmer_distrib
  • database150mers.kmer_distrib
  • database200mers.kmer_distrib
  • database250mers.kmer_distrib
  • database300mers.kmer_distrib
  • database50mers.kmer_distrib
  • database75mers.kmer_distrib

mthang avatar Jan 19 '24 02:01 mthang