sssom Mappings: Document manual curation guidelines for predicates

Mappings: Document manual curation guidelines for predicates

Open matentzn opened this issue 2 years ago • 8 comments

This goes to our best practice guide.

@sbello from MGI asked in slack:

How broad is too broad? To put that another way where is the line where you decide that there is no mapped term. I'm finding that broad can be well quite broad in use. I have some terms that are slightly broader (and maybe close would be better here) and other cases where it really is quite broad. For example I have: Congenital diaphragmatic hernia (HP:0000776) mapped to diaphragmatic hernia (MP:0003924) with broad (MP broader than HP term) as the MP term could apply to non-congenital hernias. I feel pretty confident about that the MP term is broader but not too much broader. Ptosis (HP:0000508) mapped to blepharoptosis (MP:0001344) with broad as the HP definition is very human specific, but maybe in this case it should be close? Hypoplastic left heart (HP:0004383) mapped to heart hypoplasia (MP:0002740) with broad, we plan to create a new child term of the MP term that is specific for left heart but haven't yet. Abnormality of the kidney (HP:0000077) mapped to renal/urinary system phenotype (MP:0005367) with broad. In this case the MP term is waaaayyyyy broader but given the structure difference in the MP and HP I have no closer term and no plan to create a new term. We run into this with many of the HP Abnormality of X terms where even if the the MP a high level term for X (e.g. cardiovascular system) we don't have a general abnormality of X. The MP has X phenotype (that is used in MGI to annotate normal phenotypes) and then we split to morphology and physiology. Is there any general guidance on how broad a match should be?

I think there are two important assumptions to consider before giving an answer:

the dichotomy of owl:equivalentClass / rdfs:subClassOf on the one side, and skos:exactMatch / skos:broadMatch on the other.
The purpose for which a mapping is created.

My initial intuition is this:

We use owl/logical vocabulary whenever we can reasonably assume that the intention of both semantic spaces is to reflect the exact same kind of thing. We assume for phenotype ontologies, that we have at least the taxon constraints to in between the concepts to consider, that is why we use the SKOS Vocabulary for that.
The purpose of the mapping should always be
- as general as possible
- as specific as necessary

Keeping those in mind, this is my suggestion to @sbello s great question:

A broad Match/subClassOf should be added if and only if
- there is no exact match
- there is no match that is closer (i.e. narrower than the proposed match, but broader than exact)
Following from the above, I think no match is too broad, but the broader it is, the less useful it becomes. I would suggest that
- the broadest useful mapping should be the COB-root line. So: a broad mapping to "chemical entity" may be ok, but a broad mapping to "material entity" is probably too broad to be useful.
- The mapping set should document in a comment what the threshold for broadness is:
  - "if not suitable mapping was found, we mapped to the categorical term 'phenotype'"
  - "if not suitable mapping was found, we mapped to the highest phenotypic grouping under 'phenotype'"

Something like that. In the future, I would like to explore that a bit more.. I think its a hugely important thing to get right, if we want to answer questions like: Is there a mapping for term x at all? and others..

Sep 27 '21 09:09 matentzn

@sbello also asked about close vs narrow/broad. I responded:

This is another hugely important issue: close vs narrow/broad. Intution: narrow should be used if the HP term is “kinda like a subclass”, broad if its “kinda like a superclass” and close if its “kinda not like a sub or super class, perhaps a sibling class”. “relatedMatch” should be used if its none of the above, but the term you map to lives in a completely seperate branch of the ontology (say you map an exposure to a chemical to a chemical).

Sep 27 '21 14:09 matentzn

@matentzn what would you use to relate these types of examples: Neurofibromas (HP:0001067) to increased neurofibroma incidence (MP:0010314) Neuroblastoma (HP:0003006) to increased neuroblastoma incidence (MP:0002039) Basically the difference is that the HP has terms for the tumors and MP has terms for incidence of the tumors. We've been using related for these. Do you think that is correct or would close be better.

Sep 27 '21 15:09 sbello

Devil is in the details :)

I am assuming that "increased incidence" refers to the likelihood of getting one more.

In this case, I would say, they are related. Questions to ask are:

are they exact? -NO
Does Neurofibromas imply increased neurofibroma incidence - NO (if YES, then BROAD)
Does increased neurofibroma incidence imply Neurofibromas - NO (if YES, then NARROW)
What is the common parent between increased neurofibroma incidence and Neurofibromas? Are they sort of similar? Then CLOSE.
Else related. Related really is the free for all. You should add a comment why you believe they are related, and if possible, how they are related. The may be another relation here that is more applicable, and we would want to investigate it, like something causal. BUt for your use case, this is enough.

Sep 27 '21 15:09 matentzn

Wouldn't it be better to use a different kind of relationship, like increased neurofibroma incidence (MP:0010314) <INCREASES> Neurofibromas (HP:0001067)? Maybe RO:0002213 (positively regulates) would be good, or even better to mint a sub-relationship "positively regulates amount"? It seems like there's more of an ontological statement to be made here

Sep 27 '21 15:09 cthoyt

I agree, but this goes beyond the use case that @sbello needs to cover - the question of the relationship needs to be dealt with in uPheno, and generally for all of phenotype modelling. Her task is to produce a reasonable mapping between MP and HP for some set of terms at the moment; I will use her comments though to think about the right modelling ontology level though.

Sep 27 '21 15:09 matentzn

Chiming in, feel free to ignore. I agree it would be ideal to allow for more formally defined and expressive axioms around the mappings, but also worry that it quickly would get too complex for most to do consistently or accurately. Not that most people couldn't do this, but I am guessing they will find it overwhelming. I'd also think if you allow people to do this you'd also want to enforce the use of a reasoner afterward to verify consistency. Just my 12.5 ¢. :D

Sep 27 '21 15:09 callahantiff

I think you are absolutely right @callahantiff

Relationships etc should be done on pattern level, i.e. https://github.com/obophenotype/upheno/wiki/Phenotype-Ontologies-Reconciliation-Effort

Mappings should be a level down in terms of semantic precision - especially if we want this whole business to scale.

Sep 27 '21 15:09 matentzn

I have to agree, as you can see by my incessant questions to Nico :) I'm already struggling with fine distinctions between the existing terms. @Nico, I think your description in terms of parent/sibling/child/different branch is very helpful.

Sep 27 '21 16:09 sbello

sssom sssom copied to clipboard

Mappings: Document manual curation guidelines for predicates

sssom
sssom copied to clipboard