efo
efo copied to clipboard
Document the difference between the 4 released OWL files
EFO Release v3.25.0 contains the following OWL files:
-
efo-base.owl
60.8 MB -
efo.owl
175 MB -
efo_otar_profile.owl
147 MB -
efo_otar_slim.owl
130 MB
I am curious as to how these files differ and haven't been able to find much information.
From an OpenTargets blog post:
we curated an extensive list of therapeutic areas that reflect the most appropriate body system, and therefore slimmed the ontology to ignore higher order terms (e.g. disease by anatomical system). The result is an EFO3-derived Open Targets Platform-specific profile-ontology which will be automatically generated with every monthly EFO release.
From opentargets/OnToma
:
The ontology we use in the Open Targets platform is a subset (aka. slim) of the EFO ontology plus any HPO terms for which a valid EFO mapping could not be found.
Is there any other documentation I'm missing?
@dhimmel looking at the 3.11.0 release, the 1st release that contains those files there is some info:
For Open Targets, we have also generated an Open Targets profile (which contains all of EFO with the new Open Targets therapeutic areas) and slim file (which contains just Open Targets therapeutic areas). Both are also attached to this release.
EFO vs EFO-OTAR node comparison
I compared efo.owl
and efo_otar_profile.owl
from v3.25.0 and found that EFO-OTAR adds 10 nodes and removes 57 nodes from EFO.
Nodes added by EFO-OTAR
Here are the nodes EFO-OTAR adds (in purple outline) and their ancestors:
identifier | name | depth | n_ancestors | n_descendants | ic_resnik | ic_sanchez | uri |
---|---|---|---|---|---|---|---|
MONDO:0018797 | None | 5 | 6 | 1 | 0.98 | 1.00 | http://purl.obolibrary.org/obo/MONDO_0018797 |
OTAR:0000010 | respiratory or thoracic disease | 4 | 5 | 1147 | 0.50 | 0.31 | http://www.ebi.ac.uk/efo/OTAR_0000010 |
OTAR:0000019 | familial disease | 5 | 6 | 1 | 0.98 | 1.00 | http://www.ebi.ac.uk/efo/OTAR_0000019 |
OTAR:0000008 | other | 4 | 5 | 1 | 0.98 | 1.00 | http://www.ebi.ac.uk/efo/OTAR_0000008 |
OTAR:0000018 | genetic, familial or congenital disease | 4 | 5 | 7927 | 0.29 | 0.12 | http://www.ebi.ac.uk/efo/OTAR_0000018 |
OTAR:0000003 | cyst | 5 | 6 | 1 | 0.98 | 1.00 | http://www.ebi.ac.uk/efo/OTAR_0000003 |
OTAR:0000014 | pregnancy or perinatal disease | 4 | 5 | 120 | 0.71 | 0.53 | http://www.ebi.ac.uk/efo/OTAR_0000014 |
OTAR:0000009 | injury, poisoning or other complication | 4 | 5 | 117 | 0.71 | 0.53 | http://www.ebi.ac.uk/efo/OTAR_0000009 |
OTAR:0000017 | reproductive system or breast disease | 4 | 5 | 859 | 0.53 | 0.34 | http://www.ebi.ac.uk/efo/OTAR_0000017 |
OTAR:0000006 | musculoskeletal or connective tissue disease | 4 | 5 | 3002 | 0.39 | 0.21 | http://www.ebi.ac.uk/efo/OTAR_0000006 |
One question I have is what is the purpose of adding "familial disease", "other", "cyst", since these are all leaf nodes? Are they actually a helpful way for OpenTargets to categorize disease? CC @d0choa. MONDO:0018797
also has no descendants, but appears to be a relic, soon to be removed, as per https://github.com/EBISPOT/efo/issues/938.
Nodes removed by EFO-OTAR
Here are the nodes EFO-OTAR removes (in purple outline) and their ancestors:
Expand for removed nodes table
identifier | name | depth | n_ancestors | n_descendants | ic_resnik | ic_sanchez | uri |
---|---|---|---|---|---|---|---|
MONDO:0044999 | scalp disease | 7 | 8 | 8 | 0.95 | 0.80 | http://purl.obolibrary.org/obo/MONDO_0044999 |
MONDO:0021017 | synaptopathy | 6 | 7 | 13 | 0.93 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0021017 |
MONDO:0019038 | rare maxillo-facial surgical disease | 8 | 16 | 222 | 0.75 | 0.47 | http://purl.obolibrary.org/obo/MONDO_0019038 |
MONDO:0043786 | serositis | 5 | 6 | 10 | 0.94 | 0.77 | http://purl.obolibrary.org/obo/MONDO_0043786 |
MONDO:0044974 | disease of supramolecular complex | 6 | 7 | 389 | 0.62 | 0.42 | http://purl.obolibrary.org/obo/MONDO_0044974 |
MONDO:0021635 | neurocristopathy | 5 | 8 | 134 | 0.75 | 0.52 | http://purl.obolibrary.org/obo/MONDO_0021635 |
MONDO:0044969 | disease of membrane bound organelle | 6 | 7 | 403 | 0.62 | 0.41 | http://purl.obolibrary.org/obo/MONDO_0044969 |
MONDO:0021668 | disorder involving pain | 4 | 5 | 13 | 0.90 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0021668 |
EFO:1000755 | pigmentation disease | 6 | 11 | 117 | 0.78 | 0.53 | http://www.ebi.ac.uk/efo/EFO_1000755 |
MONDO:0044980 | disease of signal transduction | 6 | 7 | 125 | 0.73 | 0.53 | http://purl.obolibrary.org/obo/MONDO_0044980 |
MONDO:0044979 | disease by cell type | 6 | 7 | 506 | 0.60 | 0.39 | http://purl.obolibrary.org/obo/MONDO_0044979 |
MONDO:0021197 | disease by cellular component affected | 5 | 6 | 1339 | 0.49 | 0.29 | http://purl.obolibrary.org/obo/MONDO_0021197 |
MONDO:0024623 | otorhinolaryngologic disease | 6 | 7 | 337 | 0.64 | 0.43 | http://purl.obolibrary.org/obo/MONDO_0024623 |
MONDO:0044975 | disease of transporter activity | 6 | 7 | 74 | 0.77 | 0.58 | http://purl.obolibrary.org/obo/MONDO_0044975 |
MONDO:0024627 | phagocytic cell dysfunction | 7 | 8 | 47 | 0.83 | 0.62 | http://purl.obolibrary.org/obo/MONDO_0024627 |
MONDO:0002436 | nasal disorder | 7 | 10 | 40 | 0.89 | 0.64 | http://purl.obolibrary.org/obo/MONDO_0002436 |
MONDO:0021073 | paraneoplastic syndrome | 5 | 6 | 9 | 0.93 | 0.78 | http://purl.obolibrary.org/obo/MONDO_0021073 |
MONDO:0018652 | biological anomaly without phenotypic characterization | 5 | 6 | 4 | 0.96 | 0.86 | http://purl.obolibrary.org/obo/MONDO_0018652 |
MONDO:0044989 | foot disease | 6 | 7 | 10 | 0.93 | 0.77 | http://purl.obolibrary.org/obo/MONDO_0044989 |
MONDO:0044987 | face disease | 7 | 8 | 1719 | 0.50 | 0.27 | http://purl.obolibrary.org/obo/MONDO_0044987 |
MONDO:0020683 | acute disease | 4 | 5 | 89 | 0.75 | 0.56 | http://purl.obolibrary.org/obo/MONDO_0020683 |
MONDO:0021195 | disease by cellular process disrupted | 5 | 6 | 2008 | 0.45 | 0.25 | http://purl.obolibrary.org/obo/MONDO_0021195 |
EFO:0000524 | head and neck disorder | 5 | 6 | 2103 | 0.45 | 0.25 | http://www.ebi.ac.uk/efo/EFO_0000524 |
EFO:0009470 | soft tissue disease | 4 | 5 | 124 | 0.72 | 0.53 | http://www.ebi.ac.uk/efo/EFO_0009470 |
MONDO:0024317 | chronic pain syndrome | 5 | 6 | 6 | 0.95 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0024317 |
EFO:0000405 | digestive system disease | 5 | 6 | 1236 | 0.51 | 0.30 | http://www.ebi.ac.uk/efo/EFO_0000405 |
MONDO:0021670 | post-infectious syndrome | 5 | 7 | 2 | 0.99 | 0.93 | http://purl.obolibrary.org/obo/MONDO_0021670 |
MONDO:0017368 | systemic disease with skin involvement | 6 | 7 | 42 | 0.83 | 0.63 | http://purl.obolibrary.org/obo/MONDO_0017368 |
MONDO:0021196 | disease by molecular activity disrupted | 5 | 6 | 251 | 0.65 | 0.46 | http://purl.obolibrary.org/obo/MONDO_0021196 |
Orphanet:79389 | Premature aging | 5 | 6 | 83 | 0.75 | 0.57 | http://www.orpha.net/ORDO/Orphanet_79389 |
MONDO:0021147 | disorder of development or morphogenesis | 4 | 5 | 3827 | 0.36 | 0.19 | http://purl.obolibrary.org/obo/MONDO_0021147 |
MONDO:0044977 | disease of receptor activity | 6 | 7 | 7 | 0.95 | 0.81 | http://purl.obolibrary.org/obo/MONDO_0044977 |
MONDO:0017261 | systemic diseases with panuveitis | 6 | 7 | 6 | 0.96 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0017261 |
EFO:0009714 | chronic disease | 4 | 5 | 107 | 0.73 | 0.54 | http://www.ebi.ac.uk/efo/EFO_0009714 |
MONDO:0021674 | post-viral disorder | 5 | 6 | 56 | 0.79 | 0.61 | http://purl.obolibrary.org/obo/MONDO_0021674 |
MONDO:0002254 | syndromic disease | 4 | 5 | 2541 | 0.39 | 0.23 | http://purl.obolibrary.org/obo/MONDO_0002254 |
MONDO:0021673 | post-bacterial disorder | 5 | 6 | 1 | 0.98 | 1.00 | http://purl.obolibrary.org/obo/MONDO_0021673 |
EFO:0009903 | inflammatory disease | 4 | 5 | 597 | 0.57 | 0.37 | http://www.ebi.ac.uk/efo/EFO_0009903 |
MONDO:0021199 | disease by anatomical system | 4 | 5 | 10922 | 0.27 | 0.09 | http://purl.obolibrary.org/obo/MONDO_0021199 |
MONDO:0005042 | head disease | 6 | 7 | 2012 | 0.47 | 0.25 | http://purl.obolibrary.org/obo/MONDO_0005042 |
MONDO:0024626 | defective phagocytic cell engulfment | 6 | 10 | 8 | 0.96 | 0.80 | http://purl.obolibrary.org/obo/MONDO_0024626 |
MONDO:0044971 | disease of macromolecular complex | 6 | 7 | 155 | 0.72 | 0.51 | http://purl.obolibrary.org/obo/MONDO_0044971 |
MONDO:0020595 | disease of retroperitoneum | 6 | 7 | 18 | 0.94 | 0.72 | http://purl.obolibrary.org/obo/MONDO_0020595 |
EFO:0009479 | throat disease | 6 | 7 | 1 | 0.99 | 1.00 | http://www.ebi.ac.uk/efo/EFO_0009479 |
MONDO:0017259 | systemic diseases with anterior uveitis | 6 | 7 | 13 | 0.92 | 0.75 | http://purl.obolibrary.org/obo/MONDO_0017259 |
MONDO:0021016 | channelopathy | 7 | 8 | 57 | 0.81 | 0.60 | http://purl.obolibrary.org/obo/MONDO_0021016 |
MONDO:0044965 | abdominal and pelvic region disorder | 5 | 6 | 977 | 0.53 | 0.32 | http://purl.obolibrary.org/obo/MONDO_0044965 |
MONDO:0020012 | systemic or rheumatic disease | 4 | 5 | 312 | 0.62 | 0.44 | http://purl.obolibrary.org/obo/MONDO_0020012 |
MONDO:0024505 | disorder by anatomical region | 4 | 5 | 4746 | 0.35 | 0.17 | http://purl.obolibrary.org/obo/MONDO_0024505 |
MONDO:0015938 | systemic disease | 5 | 6 | 257 | 0.65 | 0.46 | http://purl.obolibrary.org/obo/MONDO_0015938 |
MONDO:0044976 | disease of catalytic activity | 6 | 7 | 173 | 0.70 | 0.49 | http://purl.obolibrary.org/obo/MONDO_0044976 |
MONDO:0017260 | systemic diseases with posterior uveitis | 7 | 8 | 4 | 0.98 | 0.86 | http://purl.obolibrary.org/obo/MONDO_0017260 |
EFO:0009664 | disease of orbital region | 6 | 9 | 1481 | 0.52 | 0.28 | http://www.ebi.ac.uk/efo/EFO_0009664 |
MONDO:0044967 | limb disorder | 5 | 6 | 69 | 0.78 | 0.58 | http://purl.obolibrary.org/obo/MONDO_0044967 |
MONDO:0044990 | hand disease | 6 | 7 | 6 | 0.95 | 0.82 | http://purl.obolibrary.org/obo/MONDO_0044990 |
EFO:0001058 | sensory system disease | 6 | 7 | 291 | 0.64 | 0.44 | http://www.ebi.ac.uk/efo/EFO_0001058 |
MONDO:0021194 | disease by subcellular system affected | 4 | 5 | 2901 | 0.40 | 0.22 | http://purl.obolibrary.org/obo/MONDO_0021194 |
Code
Code to produce these figures and tables is not yet available, but is based on nxontology. I hope to make the nxontology importer for EFO available soon.
Thanks @dhimmel for the analysis. It's really useful. @zoependlington can provide more details.
From the Open Targets perspective, the background story behind the slim was that we wanted to align EFO to a more clinical interpretation. EFO has a lot of high-level organisational nodes that attend to anatomical characteristics (many of them can be seen on your analysis). However, they have little or no clinical value (e.g. disease by anatomical system). Instead, the top-nodes of the slim resemble other clinical classifications like Meddra.
In the process of reorganising the terms, a few terms have to be removed, relocated or split. You can find the logic behind most of the changes in the respective tickets. For the ones that you raised I found the next:
- Familial disease. In this case, I believe the term was not populated. The ticket describes the actions to take but it's still open and probably never implemented.
-
Cyst. I suspect this is a similar issue. The
Cyst
term was never populated with the many cyst-related diseases contained in EFO.
@paolaroncaglia and @zoependlington can comment on these two.
Regarding Other
, it's a placeholder for newly introduced terms in EFO that have no parentage relationship in the slim. We aimed to have it empty, as all diseases should be children of other root level terms (therapeutic areas). You can consider it an artefact of the process and we should eventually remove it.
Quoting @zoependlington from https://github.com/EBISPOT/efo/issues/927#issuecomment-760762229 regarding forced relationships in EFO-OTAR:
The forced relationships are defined in the subclasses templates file found in the temporary/working home of OTAR_profiler here: https://github.com/EBISPOT/otar_profiler
Just a note that the "final" version for use by Open Targets is the slim file, which only contains the therapeutic areas that are useful for annotating their data. The profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file once we have completed the ongoing work with our profile and slim files to be compatible with the Open Targets pipelines and their needs.
Great to know about EBISPOT/otar_profiler
. I see that otar_ta.sh
is the script that creates efo_otar_profile.owl
and efo_otar_slim.owl
. allTAs.txt
contains a list of therapeutic areas and newterms.tsv
contains nodes added by EFO-OTAR.
Based on otar_ta.sh
, it looks like efo_otar_slim.owl
is derived from efo_otar_profile.owl
by filtering to therapeutic areas and their descendants (via robot MIREOT --branch-from-terms
. So this is is useful for OpenTargets which wants a hierarchy of diseases only without other parts of the ontology?
Regarding "the profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file", does that mean the eventual plan is to take all the modifications in efo_otar_profile.owl
and move them upstream to efo.owl
? If so, does that mean efo_otar_profile.owl
might eventually go away, because it would be the same as efo.owl
? And does this also mean EFO intends to remove the "organisational nodes that attend to anatomical characteristics" in favor of EFO-OTAR's "clinical interpretation"?
Getting back to the original documentation request, it would be nice to have guidance in the README regarding when to use efo-base.owl, efo.owl, efo_otar_profile.owl, versus efo_otar_slim.owl. My current understanding is:
-
efo-base.owl
: use if you only want terms from the EFO namespace (subClassOf relationships might be incomplete?) -
efo.owl
: use if you want the primary EFO release with terms from the EFO namespace and those imported from other ontologies -
efo_otar_profile.owl
: use if you want the complete ontology, with modifications introduced by OpenTargets, which might eventually be adopted inefo.owl
. -
efo_otar_slim.owl
: use if you want an ontology of diseases rooted to therapeutic areas, as defined and used by OpenTargets
Is this understanding correct?