sssom-py icon indicating copy to clipboard operation
sssom-py copied to clipboard

Add JSKOS reader

Open nichtich opened this issue 1 year ago • 8 comments

General ideas about support of JSKOS format are given in #334.

Will you also provide a jskos reader? This would be the direction that is most valuable for us, at least :) 🙏

Originally posted by @matentzn in https://github.com/mapping-commons/sssom-py/issues/334#issuecomment-1413871661

nichtich avatar Feb 03 '23 07:02 nichtich

Open issues or questions

nichtich avatar Feb 03 '23 09:02 nichtich

Current JSKOS data does not include mapping_justification. In most cases it would be semapv:ManualMappingCuration but the reader does not know. Shall we use default semapv:MappingActivity or just leave the field?

you can use "UnspecifiedMatching" (make sure I spelled it correctly)

Crafting a CURIE from an URI is not obvious in all cases. Simple solution would be https://github.com/mapping-commons/sssom/discussions/188. If information about involved terminologies/ontologies in provided with JSKOS namespace for each, a shorter form with CURIE can be done.

This is not easy, but the translation process allows the user to specify custom prefix maps, which are used whenever the translation is not obvious with standard prefixes

matentzn avatar Feb 06 '23 11:02 matentzn

First draft at https://github.com/gbv/sssom-py/commit/5cfa45bc3e60b9dfd32ed03981d370e060118783, it works with an example via:

sssom parse -C merged -m tests/data/jskos-metadata.yaml --input-format jskos tests/data/jskos.ndjson

The solution to create an additional YAML with prefixes is not convenient but doable and we may provide a way to generate this automatically for most common vocabularies. Adding more information about the mapping set (if available!) would require to first map JSKOS to YAML metdadata and then JSKOS to SSSOM.

Things I stumbled upon:

  • SSSOM (TSV) lacks mapping identifiers? We manage JSKOS mappings in a database (see api endpoint) so unique identifier for each mapping (=row in SSOM TSV) are important)
  • It looks like mapping_set_id is set to a random value by default (UUID?) Can this be disabled if there is no URI of the mapping set as a whole?
  • skos:mappingRelation, the default mapping property used in JSKOS, is not included in SSSOM list of properties.
  • Records not convertible are just skipped, should parsing the stopped or a warning been given?
  • There is mapping_cardinality but only 1-to-1 mappings are supported?
  • We have author/creator labels but these are attached to person identifier. Having both author_id and author_label does not work when only some people have both URI and label, otherwise it's not clear which label belongs to which URI. Maybe it's an edge case and only provide first author in this case?
  • Where to put documentation what needs to be known when converting JSKOS <-> SSSOM?

nichtich avatar Feb 07 '23 14:02 nichtich

@nichtich I am super glad you are taking the time to deal with all this. This is super useful for the evolution of SSSOM as well.

SSSOM (TSV) lacks mapping identifiers? We manage JSKOS mappings in a database (see api endpoint) so unique identifier for each mapping (=row in SSOM TSV) are important)

There were long debates about this. Basically you have two options:

  1. Ever mapping gets a unique ID. This is very nice for provenance and makes the mapping easier to use.
  2. Every mapping has a primary key that is comprised of multiple slots/columns. So basically, if a mapping is defined by subject_id, predicate_id and object_id (primary key), you do not need a specific named key. The advantage for this is that the exact same mapping coming from different sources can be recognised as being the same.

It looks like mapping_set_id is set to a random value by default (UUID?) Can this be disabled if there is no URI of the mapping set as a whole?

Mapping_set_id and license are considered "required" by the SSSOM data model to comply with general FAIR data concerns.. If you want to drop these you will have to grep -v, but I would recommend to think about embracing it.

skos:mappingRelation, the default mapping property used in JSKOS, is not included in SSSOM list of properties.

In SKOS, we have "The SKOS mapping properties are skos:closeMatch, skos:exactMatch, skos:broadMatch, skos:narrowMatch and skos:relatedMatch. These properties are used to state mapping (alignment) links between SKOS concepts in different concept schemes, where the links are inherent in the meaning of the linked concepts." - so skos:mappingRelation is explicitly not included in the list of "mapping relations". What mapping semantics are you seeking to express using it?

Records not convertible are just skipped, should parsing the stopped or a warning been given?

Interesting, I don't know yet. Maybe logging.warning()?

There is mapping_cardinality but only 1-to-1 mappings are supported?

There are many meanings of 1-to-1, 1-to-n, but in essence, right now only 1-1 and 1-N disjunct mappings are allowed. We want to discuss a proposal for proper 1-to-n mappings in your sense in April: https://biocuration2023.github.io/workshops (called: "complex mappings").

We have author/creator labels but these are attached to person identifier. Having both author_id and author_label does not work when only some people have both URI and label, otherwise it's not clear which label belongs to which URI. Maybe it's an edge case and only provide first author in this case?

This is an excellent question and one that I also contemplated a few times.. I think the two fields should be mutually exclusive, but tbh, its not worth imposing this. We should probably just say: "author_id" and "author_label" are treated as unrelated to each other. If you have a better idea, you could make a proposal on the SSSOM issue tracker.

Where to put documentation what needs to be known when converting JSKOS <-> SSSOM?

We have never had this question before, but we always wanted to document the conversion processes better. Maybe in the official SSSOM docs, i.e. a new section/file in https://github.com/mapping-commons/sssom/tree/master/src/docs?

matentzn avatar Feb 09 '23 11:02 matentzn

skos:mappingRelation is explicitly not included in the list of "mapping relations". What mapping semantics are you seeking to express using it?

skos:mappingRelation is super-propery of all other skos mapping relations so it the semantics is "a skos mapping relation but no specified which of them". Most legacy mappings in our domain don't have an explicit type , so this is the default.

nichtich avatar Feb 09 '23 12:02 nichtich

skos:mappingRelation is super-propery of all other skos mapping relations so it the semantics is "a skos mapping relation but no specified which of them". Most legacy mappings in our domain don't have an explicit type , so this is the default.

This does make some sense.. but may invite laziness. I think you can request skos:mappingRelation to be added to the standard properties on the sssom issue tracker and see what people say. I am 50/50 (pragmatically its a good idea to add it, but idealistically maybe not so much).

matentzn avatar Feb 09 '23 14:02 matentzn

if you don't know what mapping type it is, then you can use rdfs:seeAlso. If you're a bit more sure that there is something going on, then skos:relatedMatch is the next more granular before actually having to specify what the relation is

cthoyt avatar May 12 '23 15:05 cthoyt

We regularly get lists of mappings without mapping type (e.g. plain spreadsheets with two columns) and assign skos:mappingRelation for this cases. rdfs:seeAlso makes sense as well, so JSKOS reader should change skos:mappingRelation in JSKOS to rdfs:seeAlso in SSSO.

nichtich avatar May 15 '23 07:05 nichtich