OBOFoundry.github.io icon indicating copy to clipboard operation
OBOFoundry.github.io copied to clipboard

Principle #9 users - automated validation

Open beckyjackson opened this issue 5 years ago • 10 comments

FP 9 - Documented Plurality of Users

Automated checks:

  1. Is there a valid issue tracker?
  2. Are there stated usages?

Mechanism:

We can pull the tracker value from the ontology YAML. We should ensure that this tracker resolves (does not return HTTP status > 400). It would be nice to check if there is activity on the tracker, but I'm not sure if that is possible at this time. I'm open to suggestions. If the ontology does not have a tracker, this check fails.

We can also look at the usages tag from the ontology YAML. If there are no documented usages, the ontology will get a warning. The usages should contain a user property with a valid URL. Perhaps if the URL does not resolve, we just return an info message.

We may need to standardize the usages tag. Currently, there are multiple ways that people have inserted usages. For example, ENVO contains two different examples of usages:

usages:
 - type: data-annotation
   description: "describing species habitats"
   examples:
     url: http://eol.org/pages/211700/data
   resources:
     url: http://eol.org
     label: EOL
usages:
  - user: http://oceans.taraexpeditions.org/en/
    description: Samples collected during Tara Oceans expedition are annotated with ENVO
    example:
      - url: https://www.ebi.ac.uk/metagenomics/projects/ERP001736/samples/ERS487899
        description: "Sample collected during the Tara Oceans expedition (2009-2013) at station TARA_004 (latitudeN=36.5533, longitudeE=-6.5669)"

I propose the following format for usages:

usages:
  - user: required URL
    type: optional text
    description: required text
    example:
      - url: required URL
        description: required text

beckyjackson avatar Aug 09 '19 15:08 beckyjackson

From the EWG discussion on this:

Partial automation possible, especially with respect to use of its terms in other ontologies and citations.

Chris M commented: The curation of usages must be manual and closely vetted by OBO Foundry.

We have usages partially curated here:

https://github.com/OBOFoundry/OBOFoundry.github.io/issues/451

Once in place the checks themselves can be automated.

Also easy to check things like GH activity. While it's conceivable that some ontologies with multiple users don't use GH it is at least a meaningful signal

nataled avatar Sep 24 '19 17:09 nataled

The principle says "Use of the target ontology’s term IRIs in other ontologies. This can be evidenced by linking to the other ontology that uses an ontology term IRI from this ontology" We could search for term use in other ontologies.

jamesaoverton avatar Nov 05 '19 17:11 jamesaoverton

I agree with @beckyjackson's proposed standardization of the usages tag.

I think querying for ontology usage also makes sense. It would be fun to do a more in-depth analysis to identify "citation rings" and other artefacts.

cmungall avatar Nov 05 '19 18:11 cmungall

Could this check also look at the 'browser' section on the OBO foundry page (https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/ontology/mp.md) The MP entry lists the MGI, RGD, and Monarch browsers and I was wondering if that should/could contribute to the plurality of users check.

sbello avatar Feb 25 '20 15:02 sbello

We can also query eutils to look at number citations of publication(s)

We could also add this as links from the obo site, e.g. we track the uberon pmid as 22293552, can add a link to:

https://www.ncbi.nlm.nih.gov/pubmed?linkname=pubmed_pubmed_citedin&from_uid=22293552

Of course, many ontologies are under-cited, but it's a proxy

We can also do a google search for mentions of the ontology (but this can't be done via API AFAIK)

cmungall avatar Apr 17 '20 16:04 cmungall

Also note that ontologies can be over-cited too. These are cases where the ontology was mentioned (usually as part of a "such as..." list) but not used or studied in any way. This is similar to what happens in OntoBee when it shows term usage in other ontologies, the vast majority of which are due to some wholesale import of the ontology (but the term in question was never used).

nataled avatar Apr 17 '20 16:04 nataled

Very good point @nataled! Dare I say it a lot of this over-citation may come from papers about ontologies...

cmungall avatar Apr 21 '20 01:04 cmungall

There are no objections to the schema @beckyjackson proposes

I would add: make examples mandatory, but multivalued. ie cardinality >= 1.

cmungall avatar Nov 30 '20 18:11 cmungall

@apmody and I are working on this in #1371. The proposed schema above is a little too simple. People are making good use of seeAlso to point to Biosharing/FAIRSharing, and of reference to link to publications about the usage. So we're going to try this schema:

usages:
  - user: required URL
    type: optional text (how the ontology is used, e.g. annotation)
    description: required text
    seeAlso: optional URL (e.g. FAIRSharing entry)
    examples:
      - url: required URL
        description: required text
    publications:
      - id: required URL (DOI, PubMed, etc.)
         title: required text
  • seeAlso is a secondary link to the user, such as a FAIR Sharing entry (these are more common than I expected)
  • examples should point to a specific page showing how the ontology is used by that user/resource
  • publications are papers about how the user uses the ontology, not specific examples of use

jamesaoverton avatar Jan 11 '21 16:01 jamesaoverton

I like it!

matentzn avatar Jan 11 '21 18:01 matentzn