OBOFoundry.github.io icon indicating copy to clipboard operation
OBOFoundry.github.io copied to clipboard

Principle #6 "Textual definition"

Open nataled opened this issue 6 years ago • 13 comments

Current wording:

NOTE

The content of this page is scheduled to be reviewed. Improved wording will be posted as it becomes available.

Summary

The ontology has textual definitions for the majority of its classes and for top level terms in particular.

Purpose

A textual definition provides a human-readable understanding about what is a member of the associated class.

Recommendation

Textual definitions MUST be unique within an ontology (i.e. no two terms should share a definition). Textual definitions SHOULD follow Aristotelian form (e.g. “a B that Cs” where B is the parent and C is the differentia), where this is practical.

For terms lacking textual definitions, there should be evidence of implementation of a strategy to provide definitions for all remaining undefined terms. In lieu of textual definitions, there can be elucidations when the term can not be rigorously defined.

Terms often benefit from examples of usage, as well as editor notes about edge cases and the history of the term, but these should be included as separate annotations and not in the definition.

Instances, such as organizations or geographical locations, can benefit from definitions although it is understood that definitions for instances are not required. It is recognized that OBO format (e.g., versions 1.2 and 1.4) does not allow this as an option.

Implementation

Logical definitions should agree with textual definitions. In fact, logical definitions can be programmatically used to generate textual definitions (see http://oro.open.ac.uk/21501/1/)

Textual definitions should be identified using the annotation property: ‘definition’ http://purl.obolibrary.org/obo/IAO_0000115. The source of the definition should be provided using the annotation property ‘definition source’ http://purl.obolibrary.org/obo/IAO_0000119, or as an axiom annotation on the definition assertion.

An example of providing source in an axiom annotation:

<http://purl.obolibrary.org/obo/GO_0000109> rdf:type owl:Class ;
                                            <http://purl.obolibrary.org/obo/IAO_0000115> "Any complex formed of proteins that act in nucleotide-excision repair."@en ;
                                            rdfs:label "nucleotide-excision repair complex"^^xsd:string .

[ rdf:type owl:Axiom ;
   owl:annotatedSource <http://purl.obolibrary.org/obo/GO_0000109> ;
   owl:annotatedProperty <http://purl.obolibrary.org/obo/IAO_0000115> ;
   owl:annotatedTarget "Any complex formed of proteins that act in nucleotide-excision repair."@en ;
   <http://www.geneontology.org/formats/oboInOwl#hasDbXref> "PMID:10915862"^^xsd:string
 ] .

this corresponds to the obo format:

id: GO:0000109
name: nucleotide-excision repair complex
def: "Any complex formed of proteins that act in nucleotide-excision repair." [PMID:10915862]

Examples

Class: reproductive shoot system
Term IRI: http://purl.obolibrary.org/obo/PO_0025082
Definition: A shoot system (PO:0009006) in the sporophytic phase that has as part at least one sporangium (PO:0025094).
Logical definition:

intersectionOf: participates_in some reproductive shoot system development stage

Class: chromatography device
Term IRI: http://purl.obolibrary.org/obo/OBI_0000048
Definition: A device that facilitates the separation of mixtures. The function of a chromatography device involves passing a mixture dissolved in a "mobile phase" through a stationary phase, which separates the analyte to be measured from other molecules in the mixture and allows it to be isolated.
Definition source: http://en.wikipedia.org/wiki/Chromatography
Logical definition:

intersectionOf: device
intersectionOf: has_function some material separation function

Counter-Examples

  • No definition
  • Circular/Self-referential definition “A chromatography device is a device that uses chromatography” when chromatography is not defined elsewhere

Date Accepted

  • Revised wording for principle tentatively accepted June 19, 2018.

nataled avatar Aug 05 '19 14:08 nataled

Clarifications required:

  1. What percentage of terms defined is acceptable? Our bidding: At least 50%. Top level terms should all be defined. This should encourage that the structure of ontologies should not be too flat. Most terms that have children should have definitions. For our guideline, should we suggest that the main focus be on non-leaf (parent) terms? Or should it be ‘low hanging fruit’?

  2. Edge case: Certain ontologies, such as CHEBI, rely in part on structures to define classes. How to include this? Note that we don't really want to encourage this workaround.

nataled avatar Aug 05 '19 14:08 nataled

At least 50%

Seems reasonable. Only caveat is that we don't want people gaming this. E.g. when we originally introduced a minimum % prior to CHEBI's review, they added lots of text definitions by what may have been an automated process. Many of these were duplicates. Arguably this was an overall drop in quality caused by a metric.

As an aside, it's super-easy for any ontology with a significant compositional component (>50% of terms follow DPs) using modern-DP based development to pass this since dosdps etc will autogen text defs.

Most terms that have children should have definitions

Seems arbitrary. I suppose the thinking is that these are inherently used in classification.

Certain ontologies, such as CHEBI, rely in part on structures to define classes. How to include this? Note that we don't really want to encourage this workaround.

Let's just grandfather in CHEBI as an edge case just now, it's probably the only case of this.

cmungall avatar Aug 06 '19 00:08 cmungall

intersectionOf: participates_in some reproductive shoot system development stage this is an incoherent hybrid of obo format and manchester. I'm going to have nightmares.

cmungall avatar Aug 06 '19 00:08 cmungall

I am a bit unsure how useful some arbitrary required percentage of terms would be. Some ontologies might be forever forced to have a red box on the OBO dashboard. At the very least, I would degrade these 50% coverage requirement to SHOULD, and the unique definition to SHOULD as well -> or instruct people to use a different relation when the definition is something other than "necessary and sufficient". Additionally, there are many cases now of fine-grained cell type etc that are clearly different concepts (if you look say at a picture you would see the difference), but are not easily amenable to verbal distinction. @dosumis can give many examples to that end. It think its more important that all the top terms in the hierarchies are well defined, to facilitate integration efforts for example with OBO Core.

matentzn avatar Feb 28 '20 13:02 matentzn

I'm somewhat ambivalent about the need for a unique textual definition. The sentences "Most cars have four wheels" and "Most cars are four-wheeled" are unique with respect to string, but not with respect to meaning. What we really want to capture is the uniqueness of meaning, which I do think MUST be unique. The uniqueness of the textual definition, I believe, is just an approximation of that.

On a related note, I'd say that having unique logical definitions I would say is a MUST, but we actually don't (yet) have a principle for that.

nataled avatar Feb 28 '20 14:02 nataled

On a related note, I'd say that having unique logical definitions I would say is a MUST, but we actually don't (yet) have a principle for that.

Amen to that. I added this requirement to many ODK repos I support. ROBOT supports this with a parameter on robot reason.

matentzn avatar Feb 28 '20 14:02 matentzn

On a related note, I'd say that having unique logical definitions I would say is a MUST, but we actually don't (yet) have a principle for that.

To clarify: Where terms have logical definitions they should be unique. But there should be no compulsion to add logical definitions. They should only be added where we are confident they are safe.

dosumis avatar Feb 28 '20 14:02 dosumis

In my opinion, it is a bad smell for ontologies to have all terms logically defined. It usually means people are trying too hard to logically define things, and that many of them must be either wrong or incomplete. What would be a good principle, is to have all multiple inheritance inferred using a reasoner, that I could get behind ;-).

mellybelly avatar Feb 28 '20 14:02 mellybelly

Oh no @mellybelly ! This is not how I understood @nataled ! i thought he meant: if there is a logical def, it should be unique!

matentzn avatar Feb 28 '20 14:02 matentzn

@mellybelly - agree. But perhaps best not to discuss on this ticket :) @matentzn - just wanted to clarify

Back on topic (kinda): We're contemplating adding several hundred new neuron types to FBbt - based on bulk connectomics data. Writing textual definitions for these would be a serious burden, and probably not massively useful to our users. Instead, we're persuing a strategy of linking classes to reference instances (exemplars) for which we have image/connectomic data. These can be used to judge (by inspection or programatically) whether a neuron belongs to the class. I'm wondering whether we should have a policy of allowing reference data in place of a textual definition.

dosumis avatar Feb 28 '20 14:02 dosumis

what's the status of this discussion?

nlharris avatar Jul 28 '20 01:07 nlharris

Not sure if this relevant, but using an annotation axiom on the definition to provide the db xref does not allow for the db xref to be annotated with other metadata; e.g. that PMID:10915862 was published 2000.

Perhaps this limitation doesn't matter much.

wdduncan avatar Mar 15 '21 01:03 wdduncan

This has been open for over three years, let's see if we can either close this or turn it into actionable issues.

Note that the current wording of principle 6 has improved somewhat from the text that is quoted in the top comment (perhaps in response to this issue; however, I don't see any linked PRs).

I see a lot of vigorous agreement about general content of lexical and logical definitions:

  • definitions should follow genus-differentia ("Aristotelian") form where appropriate (S3)
  • logical definitions should match textual definitions (S11)
  • logical definitions should be stated where appropriate, and only where appropriate (i.e don't overstate / overaxiomatize)

I think these are largely reflected in the current wording of the principle, great work!

These are what I think of as necessary changes:

  • [x] the reproductive shoot example uses some kind of strange and confusing pseudo-obo format syntax. Stick to Manchester syntax when providing example logical definitions. I don't think it necessary to include obo format syntax, but if we do, use actual valid syntax
  • [x] remove the OBI example, it's no longer correct, OBI does not have any logical definition for this term
  • [x] the spermatocyte example is problematic - the proposed fix is wrong and would lead to sperm being classified as a subtype of spermatocyte. I know the example is just to show a general principle but we really shouldn't be giving examples that are wrong

I think these are easily addressed by non-controversial PRs

I think there are some things that may be improved, but are not outright wrong:

  • [ ] informative sections intended as practical guides (e.g. DOSDPs) should link to obook
    • (e.g. dosdp guide
    • logical definitions guide
    • in general the obook is a better place for how-tos, and the OBO page is a better place for crisp specification of conformance criteria
  • [ ] there could be more references to additional guidelines on definitions
    • the SRS definitions paper (https://philpapers.org/archive/SEPGFW.pdf)
    • my blog post on the SRS paper: https://douroucouli.wordpress.com/2019/07/08/ontotip-write-simple-concise-clear-operational-textual-definitions/
  • [ ] I think simpler biological examples throughout would be better, using rock-solid design patterns we know are unlikely to change. My go-to examples are:
    • neuron-has-soma-location
    • cell-by-function
    • biosynthesis with chebi as output
  • [ ] "Aristotelian" is weird philosophical jargon. I think "genus-differentia" while still jargonny is better terminology, and communicates the desired form and purpose of text definitions better.

Some things I think that are out of scope for this issue:

  • detailed ontology metadata discussions
    • keep this on the OMO tracker
    • @wdduncan you can follow up on your question here: https://github.com/information-artifact-ontology/ontology-metadata/issues/43

I think if we make a push we can quickly close this issue finally.

A next phase for this principle is computational validation of the content of the text definition, adherence to the SRS standard. But I think we should make a separate issue for this. For now if people are interested in ongoing work on text definition validation, you can follow this issue:

  • https://github.com/INCATools/ontology-access-kit/issues/305

cmungall avatar Oct 14 '22 15:10 cmungall

One PR, that fixes the part under 'Examples': https://github.com/OBOFoundry/OBOFoundry.github.io/pull/2165

nataled avatar Oct 25 '22 16:10 nataled

The following PR addresses the spermatocyte example: https://github.com/OBOFoundry/OBOFoundry.github.io/pull/2186

Work to align this principle with the SRS recommendations is underway.

nataled avatar Nov 08 '22 18:11 nataled

This issue will be closed in favor of new, more tightly-scoped issues:

Alignment with the SRS paper - #2201 Links to information - #2202

Other checkboxed suggestions were discussed (EWG), and it was decided that the text is okay without these.

nataled avatar Nov 22 '22 17:11 nataled