Update documentation for entity reference to clarify CURIE/URI type
Resolves https://github.com/mapping-commons/sssom-py/issues/512
- [X]
docs/have been added/updated if necessary
The documentation for "entity references" is largely confusing, and leading, in its current form, to the expectation that metadata fields types with these can be either a CURIE or a full URI.
This is not quite right: in the TSV serialistion, as well as JSON, we expect the values to be CURIE-strings. This PR documents this design decision, but, as @joeflack4 puts it rightfully, this has a bit of a "smell" to it. Not sure what the right thing is here, but this documentation here is certainly better than not having it.
Ok, lets replace this PR by a "see also" reference to the doc on serialisations.
That said, the current description of Entity reference is still terrible :P I will leave this in draft mode to fix it another time.
Thanks for weighing in!
Note that I can’t say when I will complete the reorganisation of the docs, so if you don’t want to wait for that, you can add a note to the TSV section of the overview (which is the best/only thing we have as a “specification” for the TSV format, for now) to explicitly state that identifiers in a SSSOM/TSV file MUST be in CURIE form.
Something like
All identifiers in a SSSOM/TSV file (both in the metadata section and in the TSV columns) MUST be in CURIE form. The use of full-length identifiers is officially not supported.
That said, the current description of Entity reference is still terrible.
Agreed. “A reference to a mapped entity” is plain wrong, actually, since EntityReference is not only used for the subject_id and object_id, but for all cases where an identifier is required -- including identifiers to other things than the entities that are being mapped.
I think I agree @gouttegd that the documentation for the model should be serialization agnostic. However, from a user experience perspective, I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user; the serialization.
So I think I'll add a suggestion to your pull request, or it can be in a new PR, to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.
I'm also fine with adding a "see also" linking to the serialization documentation, but with the changes I suggested above, that is less of a necessary / useful change than it otherwise would be.
I worry about the situation where someone goes to the documentation looking for an answer and they wind up looking at the documentation for the LinkML data model rather than something that is more relevant to the user
That’s definitely something that is likely to happen right now. In fact the documentation for the data model is currently the home page for the SSSOM website, which is very unfortunate and is one of the things I want to change (issue #330; changes are underway in that branch).
to update the documentation such that the docs for the serialization formats are front and center instead. And then I wonder if the LinkML documentation should be removed, or I suppose better yet, put aside in some developer documentation subsection.
That’s the plan. Won’t happen overnight, though.
By the way, among the small things that need to be updated: the examples in the spec consistently use a full-length IRI for the creator_id slot, despite that slot being a EntityReference:
#creator_id: "https://orcid.org/0000-0002-7356-1779"
#curie_map:
# HP: "http://purl.obolibrary.org/obo/HP_"
# MP: "http://purl.obolibrary.org/obo/MP_"
# skos: "http://www.w3.org/2004/02/skos/core"
#license: "https://creativecommons.org/publicdomain/zero/1.0/"
#mapping_provider: "http://purl.obolibrary.org/obo/upheno.owl"
subject_id predicate_id object_id mapping_justification subject_label object_label
HP:0009124 skos:exactMatch MP:0000003 Lexical Abnormal adipose tissue morphology abnormal adipose tissue morphology
HP:0008551 skos:exactMatch MP:0000018 Lexical Microtia small ears
HP:0000411 skos:exactMatch MP:0000021 Lexical Protruding ear prominent ears
It’s bad enough for the spec never to explicitly say that full-length IRIs are not expected in the TSV format, but to give examples where full-length IRIs are actually used is actively misleading and is very bad.
@ehartley @gouttegd thank you for your reviews.