Struo2
Struo2 copied to clipboard
Combining fungi and viruses with GTDB
Hi,
I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with kraken2-build
. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?
I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?
Thank you very much for your time and this very nice package!