serratus icon indicating copy to clipboard operation
serratus copied to clipboard

Come up with source organism description for each genome

Open taltman opened this issue 4 years ago • 4 comments

Here's what NCBI has to say about assigning a meaningful entry for the "source" of the sequence material in the face of uncertainty:

https://www.ncbi.nlm.nih.gov/books/NBK53701/#gbankquickstart.can_i_use_the_word__unkn

So we need to come up with an organism descriptor for our submissions that don't need to be placed precisely in the NCBI Taxonomy DB, but the nomenclature should probably not be too strange relative to the existing naming used for CoVs.

taltman avatar Jul 10 '20 10:07 taltman

Serratax provides the identity of the source organism. They allow BLAST top hits as an approximate guide down to genus, which is a bad method for our situation. Serratax is much better because it reliably resolves sub-genus and species, while even genus can be wrong with BLAST (e.g. Bobbie). Perhaps this will need a discussion with GB, and possibly they won't allow it, but Serratax gives much better predictions than blast top hit to genus -- this is exactly why I implemented Serratax!

rcedgar avatar Jul 10 '20 13:07 rcedgar

Can we close this? Or unassign me? From my perspective Serratax is the solution.

rcedgar avatar Jul 11 '20 17:07 rcedgar

@taltman Can we close this? Or unassign me? From my perspective Serratax is the solution. If there is an open issue for me, please clarify, thanks.

rcedgar avatar Jul 22 '20 17:07 rcedgar

I think source in this case is host organism not virus

ababaian avatar Jul 22 '20 19:07 ababaian