Struo2 icon indicating copy to clipboard operation
Struo2 copied to clipboard

Combining fungi and viruses with GTDB

Open luhugerth opened this issue 3 years ago • 6 comments

Hi,

I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with kraken2-build. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?

I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?

Thank you very much for your time and this very nice package!

luhugerth avatar Oct 25 '21 10:10 luhugerth