dlite
dlite copied to clipboard
Ontologise the relations used in a collection
Update the builtin relations in collections. Suggested changes
(<label>, _is-a, Instance) --> (<label>, dm:instanceOf, dm:Instance)
(<label>, _has-uuid, <uuid>) --> (<label>, dm:hasUUID, <uuid>)
(<label>, _has-meta, <uri>) --> (<label>, dm:hasMeta, <uri>)
For transactions we would in addition add the following relation
(<label>, dm:hasHash, <hash>)
The "dm" prefix is http://emmo.info/datamodel/0.0.2/datamodel#
. This should be included when serialised in turtle.
Any thoughts?
Two other alternatives could be to:
- implicitly assume the "http://emmo.info/datamodel/0.0.2/datamodel#" namespace and just write
:instanceOf
,:hasUUID
... - write out the full IRIs for all relations in the collections, like
http://emmo.info/datamodel/0.0.2/datamodel#instanceOf
,http://emmo.info/datamodel/0.0.2/datamodel#hasUUID
, ...
Personally I don't like neither of these two alternatives. The first one because the namespace is implicit and the second one because it is too verbose and will clutter the triples in a collection and make them more difficult to read.
Isn't putting definining dm as prefix enough?
Personally I would prefer to use a better name than hasMeta. What is the main reason for not just using subClassOf and the inverse superClassOf instead of is-a and has-meta?
Isn't putting definining dm as prefix enough?
Yes, we can do that when serialising a collection in turtle or other formats supporting prefixes. For the internal representation within the collection I think it is fine that dm is implicitly defined, since it is only used for a few hardcoded predicates and objects.
Personally I would prefer to use a better name than hasMeta. What is the main reason for not just using subClassOf and the inverse superClassOf instead of is-a and has-meta?
Good point. If we serialise both entities and instances as individuals as suggested in https://github.com/emmo-repo/datamodel/pull/10, we can use rdf:type instead of is-a. And the has-meta should be replaced with dm:instanceOf. Then we end up with the following:
(<label>, rdf:type, dm:Instance) # former _is-a
(<label>, dm:hasUUID, <uuid>) # former _has-uuid
(<label>, dm:instanceOf, <uri>) # former _has-meta
(<label>, dm:hasHash, <hash>) # for transactions
Seems right for me.
The only possible issue is that the idea with the label is that it is local to the collection, hence allowing simple labels like "energy". But that makes the relations invalid RDF, since the subject should be a URI.
One solution could be to keep it like this - to not enforce the relations to not be valid RDF. If the user needs that, he/she should provide valid URIs as labels. Thoughts?
What we encode in the collection data-model, and how the information is represented in a triplestore are two different things. The collection yaml/json-file need to be "user-friendly", however, when the labels are to be encoded as triples, it makes sense to generate a URI consisting of <namespace>/<collection-id>#label
. This way we can query all labels and relations directly from the triplestore. no?
Yes, I fully agree that we want to keep the collection simple and "user-friendly". I think that the collection backend in tripper might be the correct place to do the conversion between the simple label
and the qualified <namespace>/<collection-id>#label
representations.
However, such conversions are not trivial, since we want reproduce the exact content of the triplestore after a conversion to and back from the collection representation. A further complication is that objects can either be IRIs or literals, so we need a way to distinguish between the two cases.
Since we want labels as simple strings, like "energy", "inst1", etc..., the best solution I see for that is to add support for prefixes in collections combined with some conventions. A possible set of conventions could be:
- string starting with "<" (ex:
<xsd:double>3.14
) <--> literal, the type is provided within the <...> - string starting with "_:" <--> blank node
- string starting with ":" (ex:
:force
) <--> IRI with default prefix - string with no colon (typically a label, ex:
force
) <--> IRI with default prefix - string of the form
<prefix>:<name>
<--> an IRI with given prefix
The default prefix is http://onto-ns.com/meta/0.1/Collection/<UUID>#
, where <UUID>
is the UUID of the collection. Convention 2, 3 and 5 follows the turtle standard. Convention 4 supports user-friendly labels.
Convention 1 is a new suggestion here to distinguish literals from IRIs. The "<" is not allowed in a valid URL, so this suggestion makes it possible to check whether a string is a literal or IRI by just checking the first character. Having the value at the end of the string is convenient in C, since one can refer to the value by just pointing to the first character after ">", without having to consider a substring of a given length or copy it to a newly allocated string. A language-specific string literal would e.g. start with "<@en>" while an unqualified literal would simply start with "<>". Using {...}
to indicate a placeholder, this would result in the following relations for describing an instance in a collection:
({label}, rdf:type, dm:Instance) # former _is-a
({label}, dm:hasUUID, <>{uuid}) # former _has-uuid
({label}, dm:instanceOf, {uri}) # former _has-meta
({label}, dm:hasParent, <>{uuid}) # for transactions
({label}, dm:hasHash, <>{hash}) # for transactions
An alternative to convention 1 would be to use turtle literals. They start with a double quote (") and are likewise easy to distinguish from IRIs, but the actual literal value does not end with a NUL, making it more cumbersome and less efficient to work with in C. I therefore prefer convention 1 as suggested above.
Update:
Instead of inventing a new syntax for literals, I think it may be easier and cleaner to move the type of a literal object to a fourth field in the internal dlite representation of a triple. If the type is NULL, the object is an IRI. Otherwise the object is a literal with the given type. An unqualified literal could be represented by an empty string (or a special pointer value, like (void *)1
).
For IRIs, we can keep conventions 2-4 in the above comment. This would result in the following internal representation:
({label}, rdf:type, dm:Instance, NULL) # former _is-a
({label}, dm:hasUUID, {uuid}, "") # former _has-uuid
({label}, dm:instanceOf, {uri}, NULL) # former _has-meta
({label}, dm:hasParent, {uuid}, "") # for transactions
({label}, dm:hasHash, {hash}, "") # for transactions
This will be easy to work with in C and convert to standard RDF representations via tripper or redland.
Since valid pointers must be word-aligned (a multiple of 4 for 32 bit systems and a multiple of 8 for 64 bit systems), there is plenty of space to encode common standard literal types in the pointer to the fourth field for very fast lookup using switch()
.