sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Extend what's allowed in `curie_map` to enable extended prefix maps

Open cthoyt opened this issue 2 years ago • 2 comments

Currently, the curie_map element takes a dictionary with string keys and string values. I propose we extend the data model of what can go in here:

  1. If a string is given, considers it as a URL pointing to an (extended) prefix map. Should be a JSON file, can be checked if it's an EPM if the value is a list, can be checked if it's a JSON-LD context if there's an @context element inside, otherwise consider as a simple prefix map
  2. If a list is given, interprets it as an extended prefix map

cthoyt avatar Nov 28 '23 14:11 cthoyt

Strongly opposed to any kind of extension where we need to peek into the contents of a field to guess its type of value.

In fact the curie_map used to be defined as “either a URL pointing to a curie map, OR the curie map itself”. We changed that in #284 because it was agreed that such Frankenstein-typed slots, where the same slot can be either a string or a dictionary, were a bad idea.

If different types of curie map are desired (e.g. simple or extended), or if a curie map can be either included directly or referenced from an external resource, then we should use different slots (e.g. curie_map for an included simple map, curie_map_ref for a link to an external simple map, extended_map for an included EPM, extended_map_ref for a link to an external EPM).

For what it’s worth I am mildly against allowing the use of a pointer to an external map (simple or extended). I think that SSSOM mapping sets should be self-sufficient and should not require accessing an external resource to be used.

I am also unconvinced that an EPM brings anything useful in the context of a mapping set. When a mapping set contains a MESH:12345678 curie, all I need is to know what URL prefix MESH stands for (which the simple curie map provides). Why would I need to know all the alternative prefix names or URL prefixes associated to the MESH namespace?

I do understand that one might want to reconcile the prefixes used in one dataset to fit the “preferred URLs” that person wants (or needs). For example, if I get a dataset that was provided to me with

#curie_map:
#  MESH: "http://meshb.nlm.nih.gov/record/ui?ui="

and for some reason my application requires MESH IDs to use the http://id.nlm.nih.gov/mesh/ form, then of course I would use an EPM (where the “preferred prefix” for MESH is the one I need, such as the OBO EPM) to automatically remap the MESH curies. But in that case it’s up to me to provide an EPM that suits my needs. Whoever creates the dataset cannot know in advance which EPM I need, so what would be the benefit of including (or referring to) an EPM in the set?

gouttegd avatar Feb 05 '24 18:02 gouttegd

Ah, and if we do allow referring to an external map: strongly opposed to allowing the external map to be represented as a JSON-LD @context. This is the Simple Standard for Sharing Ontological Mappings, so let’s keep it simple and not bring needlessly complex stuff just because we can.

gouttegd avatar Feb 05 '24 18:02 gouttegd