sssom icon indicating copy to clipboard operation
sssom copied to clipboard

Update documentation for entity reference to clarify CURIE/URI type

Open matentzn opened this issue 1 year ago • 4 comments

Resolves https://github.com/mapping-commons/sssom-py/issues/512

  • [X] docs/ have been added/updated if necessary

The documentation for "entity references" is largely confusing, and leading, in its current form, to the expectation that metadata fields types with these can be either a CURIE or a full URI.

This is not quite right: in the TSV serialistion, as well as JSON, we expect the values to be CURIE-strings. This PR documents this design decision, but, as @joeflack4 puts it rightfully, this has a bit of a "smell" to it. Not sure what the right thing is here, but this documentation here is certainly better than not having it.

matentzn avatar Apr 08 '24 09:04 matentzn

Ok, lets replace this PR by a "see also" reference to the doc on serialisations.

That said, the current description of Entity reference is still terrible :P I will leave this in draft mode to fix it another time.

Thanks for weighing in!

matentzn avatar Apr 08 '24 10:04 matentzn

Note that I can’t say when I will complete the reorganisation of the docs, so if you don’t want to wait for that, you can add a note to the TSV section of the overview (which is the best/only thing we have as a “specification” for the TSV format, for now) to explicitly state that identifiers in a SSSOM/TSV file MUST be in CURIE form.

Something like

All identifiers in a SSSOM/TSV file (both in the metadata section and in the TSV columns) MUST be in CURIE form. The use of full-length identifiers is officially not supported.

That said, the current description of Entity reference is still terrible.

Agreed. “A reference to a mapped entity” is plain wrong, actually, since EntityReference is not only used for the subject_id and object_id, but for all cases where an identifier is required -- including identifiers to other things than the entities that are being mapped.

gouttegd avatar Apr 08 '24 10:04 gouttegd

I think I agree @gouttegd that the documentation for the model should be serialization agnostic. However, from a user experience perspective, I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user; the serialization.

So I think I'll add a suggestion to your pull request, or it can be in a new PR, to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.

I'm also fine with adding a "see also" linking to the serialization documentation, but with the changes I suggested above, that is less of a necessary / useful change than it otherwise would be.

joeflack4 avatar Apr 08 '24 20:04 joeflack4

I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user

That’s definitely something that is likely to happen right now. In fact the documentation for the data model is currently the home page for the SSSOM website, which is very unfortunate and is one of the things I want to change (issue #330; changes are underway in that branch).

to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.

That’s the plan. Won’t happen overnight, though.

gouttegd avatar Apr 08 '24 21:04 gouttegd

By the way, among the small things that need to be updated: the examples in the spec consistently use a full-length IRI for the creator_id slot, despite that slot being a EntityReference:

#creator_id: "https://orcid.org/0000-0002-7356-1779"
#curie_map: 
#  HP: "http://purl.obolibrary.org/obo/HP_"
#  MP: "http://purl.obolibrary.org/obo/MP_"
#  skos: "http://www.w3.org/2004/02/skos/core"
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#mapping_provider: "http://purl.obolibrary.org/obo/upheno.owl"
subject_id  predicate_id    object_id   mapping_justification   subject_label   object_label
HP:0009124  skos:exactMatch MP:0000003  Lexical Abnormal adipose tissue morphology  abnormal adipose tissue morphology
HP:0008551  skos:exactMatch MP:0000018  Lexical Microtia    small ears
HP:0000411  skos:exactMatch MP:0000021  Lexical Protruding ear  prominent ears

It’s bad enough for the spec never to explicitly say that full-length IRIs are not expected in the TSV format, but to give examples where full-length IRIs are actually used is actively misleading and is very bad.

gouttegd avatar Apr 11 '24 15:04 gouttegd

@ehartley @gouttegd thank you for your reviews.

matentzn avatar Apr 17 '24 10:04 matentzn