robot icon indicating copy to clipboard operation
robot copied to clipboard

Duplicate label/synonym checks need to normalize literal type

Open cmungall opened this issue 4 years ago • 3 comments

the duplicate_label_synonym check will fail to find this:

AnnotationAssertion(rdfs:label <http://purl.obolibrary.org/obo/ENVO_03000085> "thermokarst"@en)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/ENVO_01001498> "thermokarst")

This is because the values are not identical. We should normalize values to strings first. Note however that this could result in significant slowdown depending on indexing strategy used.

At some stage it may make more sense to implement some of these procedurally, depending on how far we go on the text processing route

cmungall avatar Oct 05 '20 16:10 cmungall

Slightly tangential, but we really need a way to mark synonyms as allowable duplicate with labels (maybe using synonym type?). We have many cases in FBbt where the same acronym is used in the literature for multiple distinct anatomical structures (pretty common in anatomy). We add these are synonyms with a reference to back them up. This is frequently useful to anyone looking to find a term based on what they find in the literature - curators and users. I guess the rule originally comes from GO where this is less of an issue with names for processes/MFs?

dosumis avatar Oct 05 '20 18:10 dosumis

Perhaps marking as acronym might be sufficient.

dosumis avatar Oct 05 '20 18:10 dosumis

@dosumis - I would mark with a synonym type of acronym/initialisms. unfortunately every ontology uses its own synonym types. Do you still scope ambiguous initialisms as exact?

cmungall avatar Feb 10 '21 00:02 cmungall