dlite Ontologise the relations used in a collection

Update the builtin relations in collections. Suggested changes

(<label>, _is-a, Instance)   --> (<label>, dm:instanceOf, dm:Instance)
(<label>, _has-uuid, <uuid>) --> (<label>, dm:hasUUID, <uuid>)
(<label>, _has-meta, <uri>)  --> (<label>, dm:hasMeta, <uri>)

For transactions we would in addition add the following relation

(<label>, dm:hasHash, <hash>)

The "dm" prefix is http://emmo.info/datamodel/0.0.2/datamodel#. This should be included when serialised in turtle.

Any thoughts?

Two other alternatives could be to:

implicitly assume the "http://emmo.info/datamodel/0.0.2/datamodel#" namespace and just write :instanceOf, :hasUUID...
write out the full IRIs for all relations in the collections, like http://emmo.info/datamodel/0.0.2/datamodel#instanceOf, http://emmo.info/datamodel/0.0.2/datamodel#hasUUID, ...

Personally I don't like neither of these two alternatives. The first one because the namespace is implicit and the second one because it is too verbose and will clutter the triples in a collection and make them more difficult to read.

Nov 06 '21 09:11 jesper-friis

Isn't putting definining dm as prefix enough?

Oct 31 '22 14:10 francescalb

Personally I would prefer to use a better name than hasMeta. What is the main reason for not just using subClassOf and the inverse superClassOf instead of is-a and has-meta?

Nov 03 '22 09:11 quaat

Isn't putting definining dm as prefix enough?

Yes, we can do that when serialising a collection in turtle or other formats supporting prefixes. For the internal representation within the collection I think it is fine that dm is implicitly defined, since it is only used for a few hardcoded predicates and objects.

Nov 19 '22 22:11 jesper-friis

Personally I would prefer to use a better name than hasMeta. What is the main reason for not just using subClassOf and the inverse superClassOf instead of is-a and has-meta?

Good point. If we serialise both entities and instances as individuals as suggested in https://github.com/emmo-repo/datamodel/pull/10, we can use rdf:type instead of is-a. And the has-meta should be replaced with dm:instanceOf. Then we end up with the following:

(<label>, rdf:type, dm:Instance)  # former _is-a
(<label>, dm:hasUUID, <uuid>)     # former _has-uuid
(<label>, dm:instanceOf, <uri>)   # former _has-meta
(<label>, dm:hasHash, <hash>)     # for transactions

Nov 19 '22 23:11 jesper-friis

Seems right for me.

Dec 13 '22 15:12 francescalb

The only possible issue is that the idea with the label is that it is local to the collection, hence allowing simple labels like "energy". But that makes the relations invalid RDF, since the subject should be a URI.

One solution could be to keep it like this - to not enforce the relations to not be valid RDF. If the user needs that, he/she should provide valid URIs as labels. Thoughts?

Dec 13 '22 19:12 jesper-friis

What we encode in the collection data-model, and how the information is represented in a triplestore are two different things. The collection yaml/json-file need to be "user-friendly", however, when the labels are to be encoded as triples, it makes sense to generate a URI consisting of <namespace>/<collection-id>#label . This way we can query all labels and relations directly from the triplestore. no?

Dec 14 '22 11:12 quaat

Yes, I fully agree that we want to keep the collection simple and "user-friendly". I think that the collection backend in tripper might be the correct place to do the conversion between the simple label and the qualified <namespace>/<collection-id>#label representations.

However, such conversions are not trivial, since we want reproduce the exact content of the triplestore after a conversion to and back from the collection representation. A further complication is that objects can either be IRIs or literals, so we need a way to distinguish between the two cases.

Since we want labels as simple strings, like "energy", "inst1", etc..., the best solution I see for that is to add support for prefixes in collections combined with some conventions. A possible set of conventions could be:

string starting with "<" (ex: <xsd:double>3.14) <--> literal, the type is provided within the <...>
string starting with "_:" <--> blank node
string starting with ":" (ex: :force) <--> IRI with default prefix
string with no colon (typically a label, ex: force) <--> IRI with default prefix
string of the form <prefix>:<name> <--> an IRI with given prefix

The default prefix is http://onto-ns.com/meta/0.1/Collection/<UUID>#, where <UUID> is the UUID of the collection. Convention 2, 3 and 5 follows the turtle standard. Convention 4 supports user-friendly labels.

Convention 1 is a new suggestion here to distinguish literals from IRIs. The "<" is not allowed in a valid URL, so this suggestion makes it possible to check whether a string is a literal or IRI by just checking the first character. Having the value at the end of the string is convenient in C, since one can refer to the value by just pointing to the first character after ">", without having to consider a substring of a given length or copy it to a newly allocated string. A language-specific string literal would e.g. start with "<@en>" while an unqualified literal would simply start with "<>". Using {...} to indicate a placeholder, this would result in the following relations for describing an instance in a collection:

({label}, rdf:type, dm:Instance)             # former _is-a
({label}, dm:hasUUID, <>{uuid})              # former _has-uuid
({label}, dm:instanceOf, {uri})              # former _has-meta
({label}, dm:hasParent, <>{uuid})            # for transactions
({label}, dm:hasHash, <>{hash})              # for transactions

An alternative to convention 1 would be to use turtle literals. They start with a double quote (") and are likewise easy to distinguish from IRIs, but the actual literal value does not end with a NUL, making it more cumbersome and less efficient to work with in C. I therefore prefer convention 1 as suggested above.

Dec 14 '22 19:12 jesper-friis

Update: Instead of inventing a new syntax for literals, I think it may be easier and cleaner to move the type of a literal object to a fourth field in the internal dlite representation of a triple. If the type is NULL, the object is an IRI. Otherwise the object is a literal with the given type. An unqualified literal could be represented by an empty string (or a special pointer value, like (void *)1).

For IRIs, we can keep conventions 2-4 in the above comment. This would result in the following internal representation:

({label}, rdf:type, dm:Instance, NULL)           # former _is-a
({label}, dm:hasUUID, {uuid}, "")                # former _has-uuid
({label}, dm:instanceOf, {uri}, NULL)            # former _has-meta
({label}, dm:hasParent, {uuid}, "")              # for transactions
({label}, dm:hasHash, {hash}, "")                # for transactions

This will be easy to work with in C and convert to standard RDF representations via tripper or redland.

Since valid pointers must be word-aligned (a multiple of 4 for 32 bit systems and a multiple of 8 for 64 bit systems), there is plenty of space to encode common standard literal types in the pointer to the fourth field for very fast lookup using switch().

Jan 31 '23 17:01 jesper-friis

dlite dlite copied to clipboard

Ontologise the relations used in a collection

dlite
dlite copied to clipboard