OBOFoundry.github.io icon indicating copy to clipboard operation
OBOFoundry.github.io copied to clipboard

Principle #12 naming conventions - automated validation

Open beckyjackson opened this issue 6 years ago • 9 comments

FP 12 - Naming Conventions

Automated checks:

  1. All entities must have labels
  2. No entities may share a label
  3. No entity may have more than one label

Mechanism: ROBOT report already includes checks 1 through 3. We can run report and only look at the results of these three checks. If any of the rules are violated, the check fails.

We also may want to look at overlapping labels at some point (entities from separate ontologies that share a label) and determine if these need an 'OBO Foundry unique label', though I'm not sure if that needs to be addressed right now.

beckyjackson avatar Aug 09 '19 15:08 beckyjackson

From EWG discussion on this:

labels must be unique within ontology, lowercase, no underscores

nataled avatar Sep 24 '19 17:09 nataled

Checking that labels start with a lowercase character could be something we can add to ROBOT report. I wouldn't say it was an error, though, as there may be exceptions - either warning or even an info message? Underscore checking, as well, although I'm trying to think if there may be exceptions to this. @jamesaoverton - what do you think?

beckyjackson avatar Sep 27 '19 15:09 beckyjackson

I agree about uniqueness. ROBOT already checks that.

There are lots of old terms that include underscores, especially relations. I'd like to switch them to spaces for consistency, I just worry that changing labels can break things, and I don't know how important that really is.

While lowercase is a good rule of thumb, I can think of so many valid exceptions that I don't see how we can make a worthwhile automated check. Just looking at OBI, we have plenty of terms with labels that include proper names (companies, trademarked devices, 'Bernoulli trial'), taxa ('Mus musculus'), others like "B cell" and "T cell", all of which seem legitimate to me. We also have cases where we use an acronym as part of a label when it's better known than the expanded version, which we do judiciously.

jamesaoverton avatar Sep 27 '19 16:09 jamesaoverton

Underscores for relations are indeed relatively accepted (and actually rather useful), but not for other terms. You're spot on with the lowercase issues, though lowercase is indeed the default casing that should be used (other than the usual exceptions for proper names and very common abbreviations such as 'DNA'). Oh, forgot that CamelCase is also not allowed.

The NCBITaxon exceptions are so ubiquitous that there is probably no need to run this check on it. Then again, no one maintains that ontology so none of the principles actually apply to it.

Perhaps the casing check could be as simple as "XYZ ontology has nn% terms that are uppercase." I would say being close to 100% for NCBITaxon is to be expected, but the number should be relatively small for other ontologies.

One other thing--I hesitate to even mention it--is that we could maintain a list of accepted uppercase labels. I actually do this for PRO; that is, I have a file that lists things that are okay, like Holliday and Golgi, and allow those to 'pass'. I hesitate to mention it because of the maintenance and portability issues that would come with implementing such a mechanism. I suppose a separate file could be created that contains some minimal set, and users could add to it after download, and maybe even suggest additions. ROBOT could look for this file (if it exists) and read its contents.

nataled avatar Sep 27 '19 16:09 nataled

It seems ROBOT check uniqueness of labels with prefix assigned to a given ontology but not including all imported terms. It would be good to check all terms in an ontology to give warning to ontology developers that some entities shared a label.

VEuPathDB ontology made a release on 2019-12-16. During release process, we found IDO_0000586 and OBI_1110021 shared label 'infection' due to imported OBI terms are out-of-date. The issue identified by manual review rather than Robot tool checking.

zhengj2007 avatar Jan 31 '20 21:01 zhengj2007

@zhengj2007 This is a little tricky:

  1. The robot report query for duplicate labels does not filter by prefix -- it includes all labels in the loaded ontology. I suspect that you were running robot report on an editing version of your ontology, without the imports merged. If there's a bug with this, it would be better on the ROBOT tracker.

  2. However the OBO Dashboard tests do filter by prefix, so that the dashboard does not report problems with imports. I think that's the correct behaviour.

jamesaoverton avatar Jan 31 '20 22:01 jamesaoverton

@jamesaoverton Thanks for explanation. I did not run the robot report during release. I downloaded the results from OBO Dashboard tests. That's why it was not identified. However, it might be good to include it on OBO Dashboard tests by throwing a warning message.

zhengj2007 avatar Feb 03 '20 15:02 zhengj2007

What's the status of this? Is this now covered by the dashboard checks?

nlharris avatar Jan 26 '22 22:01 nlharris

Status unsure, pending review by EWG.

nataled avatar Jan 26 '22 22:01 nataled