sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Proposal for accommodating complex mappings

Open matentzn opened this issue 3 years ago • 0 comments

Note: all comments in the following apply to both subject and object fields.

The normal mapping case is when one term in a source set (subject) is mapped to exactly one term in the target set (object). There are, however, many cases where we need to map sets of terms (subject and/or object), for example:

UBERON:Eye+NCBITaxon:Xenopus->XAO:Eye MP:adiposeTissuePhenotype+PATO:abnormal->HP:AbnormallyAdiposeTissue MP:X due to DO:1 -> HP:Y due to MONDO:1

This is can become a complicated mess, but I suggest the following:

  1. We allow pipe separated term lists for both subject_id and object_id. These lists are considered in the order given.
  2. We introduce a new (optional) field called object_pattern which is, by default, none (which means everything in subject_id is considered to be a single identifier pertaining to one term). Now if someone wishes to create a complex mapping, they would write a complex expression like RO:001 some (%s and (RO:002 some %s)) or simply %s and %s (see how we did this in a different context using templates). The filler terms (%s) are filled one by one with terms from the pipe seperated list in subject_id, which materialises the expression, for example, as an owl_class_expression.
  3. We introduce a new (optional) field called object_pattern_type, which is, if NOT set, interpreted to be a "class expression in manchester syntax" (so there is no need to set it). This could be used in the future to accomodate other kinds of patterns as well (there are complex expressions for example in the RBOX that are not class expressions, but maybe someone wants to use this to map non-owl patterns as well).

Does this make sense?

@cmungall @kshefchek @mellybelly @diatomsRcool @balhoff

matentzn avatar Aug 22 '20 13:08 matentzn