kmcp icon indicating copy to clipboard operation
kmcp copied to clipboard

suitable for CDS and/or contig taxonomic assignment?

Open AstrobioMike opened this issue 1 year ago • 2 comments

Hey there, @shenwei356 :)

Thanks again for not only additional wonderful software, but excellent documentation as usual!

I'm looking for a better way to taxonomically classify predicted coding sequences and contigs from metagenomic assemblies (i currently use CAT with NCBI's nr).

I really want a combination of GTDB for bacteria/archaea, and then also be able to combine euks from NCBI, so your infrastructure enabling that sort of thing is really appealing to me 🙏

I see in issue https://github.com/shenwei356/kmcp/issues/27 you note that KMCP is not suitable for long reads. Is your thinking similar for assembled contigs too?

And would you expect to have the same thoughts about taxonomically classifying predicted coding sequences (which might average around 800-1000 bases)?

If only one of those would be possible, I can imagine it might be reasonable to use it to infer the other. E.g., if contigs are do-able, then assigning all CDSs whatever their source contig tax was. And if CDSs are do-able, employing some consensus approach to assign to the contig the tax of its CDSs.

Sorry if i'm missing if you've covered this elsewhere already, and thanks for any of your thoughts!

AstrobioMike avatar May 22 '23 18:05 AstrobioMike