robot
robot copied to clipboard
Duplicate label/synonym checks need to normalize literal type
the duplicate_label_synonym check will fail to find this:
AnnotationAssertion(rdfs:label <http://purl.obolibrary.org/obo/ENVO_03000085> "thermokarst"@en)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/ENVO_01001498> "thermokarst")
This is because the values are not identical. We should normalize values to strings first. Note however that this could result in significant slowdown depending on indexing strategy used.
At some stage it may make more sense to implement some of these procedurally, depending on how far we go on the text processing route
Slightly tangential, but we really need a way to mark synonyms as allowable duplicate with labels (maybe using synonym type?). We have many cases in FBbt where the same acronym is used in the literature for multiple distinct anatomical structures (pretty common in anatomy). We add these are synonyms with a reference to back them up. This is frequently useful to anyone looking to find a term based on what they find in the literature - curators and users. I guess the rule originally comes from GO where this is less of an issue with names for processes/MFs?
Perhaps marking as acronym might be sufficient.
@dosumis - I would mark with a synonym type of acronym/initialisms. unfortunately every ontology uses its own synonym types. Do you still scope ambiguous initialisms as exact?