sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Allow multiple match types

Open matentzn opened this issue 5 years ago • 6 comments

So far, we have only allowed a single match_type per mapping, like LexicalMatch, LogicalMatch, with the idea that in case we have multiple ways a term maps, we just create multiple mappings. Some people did not like this idea: they want to be able to just say that a single given mapping is both a LexicalMatch and a LogicalMatch. This issue here gives the opportunity to discuss the matter, but I am inclined to grant this request and turn match_type into a |-separated list that means, strictly: "this mapping can be derived via multiple routes". The main argument against this is that a bunch of metadata elements directly refer to the match_type (what would subject_match_field for example refer to if multiple match_types are chosen? All of the matches?). The main argument for this is that a user can see immediately the strong evidence for a mapping. My personal sense of clarity still tends to the single match_type to avoid the confusion of how to interpret other metadata, but I can see the appeal for other users, and will therefore simply do this (multiple match_types) if there are no further arguments against.

matentzn avatar Aug 22 '20 13:08 matentzn

This came in part from me - sometimes we only want to create a mapping if it meets multiple criteria. Could also have multiple rows. Still no way to say what the mapping criteria were and want to be able to say that it was more than one match type.

mellybelly avatar Aug 25 '20 23:08 mellybelly

I think your request makes sense! We just need to decide, due to the simplicity of the model, what to do about metadata fields that are in effect about the match type (see above)..

matentzn avatar Aug 26 '20 10:08 matentzn

I am now tending again to multiple rows, because it gets really cumbersome if you have say two different lexical matches (label-exactsyn, exactsyn-exactsyn, etc).

matentzn avatar Jul 16 '21 08:07 matentzn

However, this won't play well with algorithms that have often complex reasons for a match, and then a combined confidence. Maybe these should emit all evidence for a match separately, and then another one combined? @ernestojimenezruiz any ideas how would could capture such combinations of evidence effectively?

matentzn avatar Jul 16 '21 08:07 matentzn

I do not have a strong preference here. For LogMap, for example, all discovered mappings are lexical matches with some additional characteristics: reasonable lexical similarity, similar neighborhoods, and not leading to (many) logical conflicts. In this sense I like this multiple match_type field. Most systems will produce lexical matching an optionally another match type. S-Match is probably one of the few systems computing pure logic-based mappings.

Alternatively, I understand Nico's preference of keeping the mapping definition simple. The work then falls to the one selecting a (sub)set of the mappings that are "good" for an application: e.g., not only lexical, proposed by more than one system, etc.

It would also be interesting to be able to annotate sets of mappings (not necessarily from a single source), by adding some metadata about the quality (lexical, structural, logical), voted by different systems/sources, manually curated, if leading to logical errors, etc. Not sure if SSSOM takes this into account.

ernestojimenezruiz avatar Jul 20 '21 12:07 ernestojimenezruiz

For now, we will stick with single match type (recently renamed to mapping justification). However, it will be possible to define an "aggregated mapping" in the next version of SSSOM, which is basically a mapping statement that can refer to multiple justifications.

matentzn avatar Jun 03 '22 11:06 matentzn