ncbitaxon
ncbitaxon copied to clipboard
What are GC_IDs?
Most terms have an xref to a namespace with a prefix GC_ID. Is anyone familiar with what that is or what it abbreviates?
I have a partial answer. The ncbitacon.owl file is a direct translation of the taxdmp.zip file available here: https://ftp.ncbi.nih.gov/pub/taxonomy/. In that directory is a taxdmp_readme.txt that explains the various fields. "GC" is their abbreviation for "genetic code", and points to a gencode.dmp file that we do not translate. Official NCBI Taxonomy pages include a "Genetic code" field with a link, e.g. https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606&lvl=3&lin=f&keep=1&srchmode=1&unlock. That's as much as I know.
Thanks @jamesaoverton, that's much appreciated. It's unbelievable how many nomenclatures the NCBI has generated...
FWIW UMLS doesn't translate this either https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NCBI/sourcerepresentation.html
I suggest
- Register something like NCBI.gc with identifiers.org / n2t.net
- Point this at URLs like https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG2 or https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
- have an annotation/xref pointing to this
- (stretch) have some kind of ontological rendering of gencode.dmp (btw, did the file move? I don't see it). E.g
- taxon has-part (nuclear genome and has-part some translation system GC_ID)
- GC_ID a SP:codon, label "ATG", starts-with some (adenine and followed-by thymine and ends-with guanine) encodes chebi:methionine]
- this injects a bunch of blank nodes into the ontology with no real priority use case and would be for the sake of ontological completeness, so YMMV....
FYI: This has been registered in the Bioregistry at http://bioregistry.io/registry/gc