micom
micom copied to clipboard
Support for GTDB taxonomy?
Checklist
- [X] There are no similar issues or pull requests for this yet.
- [X] The request is not specific for MICOM Qiime 2 plugin (q2-micom)
Is your feature related to a problem? Please describe it.
The Genome Taxonomy Database (GTDB) is comprehensive (especially the new v202 release) and more robust than the NCBI microbial taxonomy, especially given that the GTDB taxonomy is completely based off of genome phylogenic relatedness.
Although the MICOM docs are vague about the taxonomy that one must use, it appears that the NCBI taxonomy is required.
Describe the solution you would like.
Provide direct support for the GTDB taxonomy.
MICOM doesn't really set any requirements for the taxonomy but you are right that you usually need the taxonomy of your data to match the taxonomy of the model database.
I also thought about providing the model databases with different taxonomies but haven't found a good way to map NCBI taxon IDs to GTDB ones. If you know of a way to do so that would be great. Otherwise, we would have to get all the original genomes from the database and classify them but that would be pretty involved because it is not straightforward to get the genomes for the AGORA models for instance.
I also thought about providing the model databases with different taxonomies but haven't found a good way to map NCBI taxon IDs to GTDB ones
You could use or build on a simple script that I wrote to map the NCBI taxonomy to the GTDB taxonomy: ncbi-gtdb_map.py. It simply uses the metadata provided by the GTDB, which includes NCBI and GTDB taxonomies for each genome.
If you need to map at the taxid level, some of the other scripts in that repo might be useful.
Oh cool, will try with that one.