amigo icon indicating copy to clipboard operation
amigo copied to clipboard

Some AspGD taxon data appears without a label

Open ValWood opened this issue 5 years ago • 12 comments

there is something amiss with the way AspDB annotations appear in AmiGO.

Tha taxon isn't parsed correctly so they are not available in the organsim filter, E.g.

http://amigo.geneontology.org/amigo/gene_product/AspGD:Aspfo1_0204585

ValWood avatar Jun 25 '19 14:06 ValWood

@ValWood That's a good catch--thank you. I'll look into this today.

kltm avatar Jun 25 '19 14:06 kltm

Examining similarly formed entries from the GAF, this was not a uniform problem: http://amigo.geneontology.org/amigo/gene_product/AspGD:Aspka1_0181639

To note, the issue here seems to be that in some cases the taxon ID does not seem to get resolved to a label, which means that the main taxon entry is left "blank" and is left as an ID in the table.

As this is a relatively new annotation done on the day of the release, I wonder if somehow the ncbi taxon ontology could have been out of sync with the annotations, leading to a case where the label went AWOL.

kltm avatar Jun 25 '19 15:06 kltm

Partially bum theory as 2019-06-23 http://amigo-exp.geneontology.io/amigo/gene_product/AspGD:Aspfo1_0204585 still has the information gap.

kltm avatar Jun 25 '19 15:06 kltm

It might be something to do with taxon strain IDs vs strain IDs (some species have strain IDs in NCBI). I'm not completely sure what these particular IDs are but it's a possibility.

@marekskrzypek might be able to enlighten you?

ValWood avatar Jun 25 '19 16:06 ValWood

It isn't restricted to AspDB

http://amigo.geneontology.org/amigo/gene_product/CGD:CORT_0G01250

ValWood avatar Jun 25 '19 17:06 ValWood

@ValWood It seems to be the same taxon though: NCBITaxon:1136231 , which is a good thing. I do not think the problem resides in the GAF, rather likely in loader or the NCBITaxon file that we load.

kltm avatar Jun 25 '19 17:06 kltm

Noting from load log:

[2019-06-10T12:09:23.763Z] 2019-06-10 12:09:23,648 INFO  (GafSolrDocumentLoader:
189) Skipping taxon closures for unknown id: NCBITaxon:1136231

That's owltools, around

		final OWLClass taxCls = graph.getOWLClassByIdentifier(taxonId);

within bioentity solr document assembly. That would seem like an issue at the ontology then. @balhoff Would you be able to officially confirm the presence or not of NCBITaxon:1136231 in "http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl" ? Grepping shows that it is not there. If not, what are the channels to add it?

kltm avatar Jun 26 '19 01:06 kltm

@cmungall I believe that you originally made the taxslim ontology? What would be the procedure for getting something in there? Re: https://github.com/geneontology/amigo/issues/570#issuecomment-505678832

kltm avatar Jun 26 '19 01:06 kltm

@cmungall what is the origin of taxslim? Should we just expand GO ncbitaxon_import as needed and extract with ROBOT? Could keep a seed file in addition to the taxa directly referenced in the ontology.

balhoff avatar Jun 27 '19 19:06 balhoff

there is something amiss with the way AspDB annotations appear in AmiGO.

Those were always like this. Only the ones coming from UniProt had the correct gene label.

Maybe this is not related but UniProt and AspDB and CGD weren't using the same tax id (although they were technically describing the same species).

Pascale

pgaudet avatar Jul 02 '19 16:07 pgaudet

Does it need fixing upstream? Who do we tag?

ValWood avatar Jul 02 '19 21:07 ValWood

@marekskrzypek

pgaudet avatar Jul 12 '19 17:07 pgaudet