Allow Amino Acid Contig Databases
I've added a flag --allow-amino-acid-contig-db to the anvi-gen-contigs-database module. I came across a need for allowing for already gene called and translated sequences when reannotating databases and supplying an external gene calls file. In addition, I've changed the documentation to reflect this and added a warning that the fasta must be in a annotated protein sequence format. If it isn't clear in the docs please let me know and I can try to clarify. Figured someone may have a use for this as well if they are also annotating existing resources broadly.
This runs locally, but if there are any additional tests I should run then please let me know!
Hey @Kekananen,
Thank you very much for this. Many people asked us to consider doing this, and we avoided that all these years since it really makes the 'contigs-db' just a bit more something else. But I'm sure having that kind of functionality for this artifact would make people who work with obscure branches of life (cough eukaryotes cough or their viruses cough cough) very happy.
I think there are a few things we all would need to think. The first one comes to my mind is the following: since contigs-db artifact is such a central piece in the anvi'o ecocystem (i.e., look at all the programs that consume it), there are many many downstream processes that depend on it. So at the very least, we would need to ensure that each contigs-db knows it's type (we have a db-variant variable in the self table, and it can be used to mark contigs-db files that has AA sequences, I guess). This way we can limit the application of some tools to those db-variants, or adjust others.
For instance, what happens when you run anvi-run-ncbi-cogs on a contigs-db of AA sequences? What happens when you create an external genomes from a few of them and then run anvi-gen-genomes-storage? Or what happens when you run anvi-run-scg-taxonomy, or anvi-display-contigs-stats on it?
I think these would be useful first pass investigations :)
That makes total sense. Let me do a little bit more testing on additional modules (will use the (docs)[https://anvio.org/help/main/artifacts/contigs-db/] as a guide) to get a sense of what might be breaking. For now I'll convert this back to a draft and ping you when I have some decent results :)