specification Quad support in Solid

Quad support in Solid

Open kjetilk opened this issue 3 years ago • 11 comments

@rubensworks brought up the need for quads in #125 , and since it hasn't been discussed much, I wanted to open an issue to make sure the discussion has a home.

I don't think it is so much that the graph name part of the quad is as afterthought in Solid, rather, it seems that there are a number of constraints on it that are silently acknowledged. Thus, I think that they need to be clearly formulated, so this is a start.

First, the original design of RDF was triple based, and there is a strong history behind this from the knowledge representation field, as it can be used to represent pretty much everything [citation needed]. A fourth term, initially usually called the context now usually called the graph was added to be able to divide the data set up better and talk about parts of it.

Eventually, SPARQL took this in fully, and SPARQL became defined in terms of quads, not just triples.

For Solid, the original idea of RDF as triples is strong, i.e. each individual resource has a representation that is just triples, not quads.

However, it is easy to identify a quad in this scheme too: The fourth component is the request URI identifying each resource. This seems to one of the not-very-outspoken understandings of Solid. Thus, Solid can also be thought of as quad-based.

With these ideas, such constraints apply:

Graphs must identify an information resource.
The URI of a graph should be dereferenceable.
The resource representation referenced by a graph is usually under access control.
Currently, the graph is the smallest unit information that can be under access control.

That's at least what springs to mind, at present, and I don't think these are very unreasonable constraints. It would be interesting to have this formalized by the academic community.

Jul 27 '21 19:07 kjetilk

If you want to support storing Verifiable Credentials in a Pod as native RDF, I would suggest reconsidering these criteria.

In particular, the proof portion of a VC will typically be in a graph that is identified with a blank node. Other portions of that VC may also be identified by other, distinct blank nodes.

In order to retain the validity of the signature of this VC, the server may not modify those graph names, nor may it move those proof triples into the default graph.

Other JSON-LD 1.1 structures will have similar requirements.

Jul 28 '21 00:07 acoburn

The fourth component is the request URI identifying each resource.

Such an interpretation definitely makes sense IMO. The only problem with it (summarizing my previous comment) is that it excludes several use cases that really depend on the context interpretation of the graph (such as VC, Nanopublications, and RSP-QL). Furthermore, this interpretation would require strict limitations on serializations such as JSON-LD, N-Quads, and TriG, which may lead to confusion among developers coming from the broader RDF/Linked Data world.

So my view on this is that the vague semantics of the graph component has lead to many different interpretations being attached to it over the years (semantic debt?). Therefore, I think we should not attach any special meaning to it ourselves, to avoid conflicts.

(For reference, I avoid using graphs myself as much as possible, for these exact reasons, but I aim to enable it whenever possible)

Jul 28 '21 06:07 rubensworks

Indeed, the vague semantics has lead to many different interpretations, and that is a problem. I would like to understand these problems in more depth, but no rush.

However, these comments has me worried that something has become overcomplicated over the years. A proof is most certainly an information resource that must be possible to get over the network for it to be useful, and for which access controls may apply. This tension should not exist... I must say, from a quick look over Nanopublications and RSP-QL, I could also not spot where the problem might be.

It seems to me that introducing blank nodes for graph names would create many other problems, why would you say that it MUST NOT have a global identifier? Also, the ability to sparql (verbing weirds language!) that graph seems like a large sacrifice.

Aug 02 '21 15:08 kjetilk

Tangent: SPARQL should remain a noun like SQL (which you don't see used in phrases like "the ability to sql that table"). Preferred would be "the ability to use SPARQL on" (or "the ability to use SPARQL to query that graph", or "the ability to query that graph with SPARQL", or "the ability to run SPARQL queries over that graph"), among other phrasings.

Aug 02 '21 17:08 TallTed

I have found that quad support in Solid could be useful for access control and made a proposal on that subject #247 support Trig serialization of Access Control Resources. The best use case there is when combined with a new wac:imports suggestion. I have implemented wac:imports in Reactive Solid with Trig support.

Aug 25 '21 17:08 bblfish

However, it is easy to identify a quad in this scheme too: The fourth component is the request URI identifying each resource. This seems to one of the not-very-outspoken understandings of Solid. Thus, Solid can also be thought of as quad-based.

Can we model quad resources as a proxy over this model then? i.e. a resource type which is LDP-NR (non rdf-source in terms of ldp), which is a virtual resource represented as a quad based rdf-dataset, acting a s proxy to other LDP-RS'. When we create such virtual LDP-NR, it creates multiple LDP-RS whose request-iri is one-of graph names in proxy-resource description. We can limit operations on this proxy resource to be only READ, CREATE, DELETE. on READ, that virtual resource can be constructed back from concrete ldp-rss.

This will allow quad datasets like nanopubs be understood by ldp. @rubensworks

Aug 30 '21 03:08 damooo

So, first of all, it was not intended to define quads in Solid in a novel way, just to codify the unspoken assumptions around quads as I see them right now. I don't think this is something we can prioritize, and even though I've tried to find time to look into the details that you've posted, I haven't found that. I suppose thinking in terms of a proxy can be helpful.

I would just like to show how the example in the Nanopubs spec could be done under the current model: (Prefixes omitted for brevity, some changes to syntax, because I feel too many colons make it hard to read)

Representation of http://example.org/pub1/Head :

  <.> a np:Nanopublication .
  <.> np:hasAssertion <./assertion> .
  <.> np:hasProvenance <./provenance> .
  <.> np:hasPublicationInfo <./pubinfo> .

Representation of http://example.org/pub1/assertion :

  ex:trastuzumab ex:is-indicated-for ex:breast-cancer .

Representation of http://example.org/pub1/provenance

  <./assertion> prov:wasDerivedFrom <./experiment> ; 
     prov:wasAttributedTo orcid:0000-0003-3934-0072 .

Representation of http://example.org/pub1/pubinfo

  <./> dct:creator orcid:0000-0003-0183-6910 .
  <./> dct:created "2020-07-10T10:20:22.382+02:00"^^xsd:dateTime  .

So, this is now 4 separate GET requests, but it encodes exactly the same graph as the original example, right? I don't think it is appropriate to rely on a particular serialization, as long as the same graph can be expressed.

Now, I happen to think that the example is already overcomplicated, I'd understand the desire to name parts of the graph, but I wouldn't have made that Head, I would have put that in the default graph, which would then naturally map to the container.

Triples vs. quads aren't about what you can express, it is a practical measure, and as a practical measure, different architectural assumptions makes for different conclusions about what is practical.

Sep 03 '21 09:09 kjetilk

So, this is now 4 separate GET requests, but it encodes exactly the same graph as the original example, right?

It depends on the interpretation of the graph component. If you consider graph to only indicate the resource location, and all their contents are part of a union default graph, then this is correct.

But not everyone makes use of these semantics. E.g. SPARQL will for an ?s ?p ?o query not return triples in named graphs.

Also, the graph component is allowed to be a blank node, which means that it can not always refer to a resource location.

Sep 03 '21 09:09 rubensworks

Also, the graph component is allowed to be a blank node, which means that it can not always refer to a resource location.

This is quite typical for Verifiable Credentials

Sep 03 '21 09:09 acoburn

It depends on the interpretation of the graph component. If you consider graph to only indicate the resource location, and all their contents are part of a union default graph, then this is correct.

Yes, and that's why I wanted to make that notion explicit.

But not everyone makes use of these semantics. E.g. SPARQL will for an ?s ?p ?o query not return triples in named graphs.

Indeed, but there are two ways to fix that problem: On a Solid server, if you have a SPARQL evaluator on the backend, this is something you will encode into the SPARQL engine. If you do have a standalone server, it is more difficult, but if we make the notion explicit, it should be a straightforward for a SPARQL engine to adopt the view as it queries over a pod.

Also, the graph component is allowed to be a blank node, which means that it can not always refer to a resource location.

Yup, but then, that's also not allowed in SPARQL, so there is a lot to loose with such a design. I'd rather suggest that the VC community should review that, @acoburn :-)

Sep 03 '21 11:09 kjetilk

I've been struggling lately with how solid mixes client and server managed triples in container representations and (auxiliary) description resources. There is a practice of throwing in statements from Description Resource of the container into the container representation, by changing Description Resource the client can change the statements which are later included in the representation of the container.

I think it would be much cleaner not to just mix statements asserted in one resource (especially by clients) into another resource. Instead, clients either make two separate requests to the container and its Description Resource to get all the data, or they could content negotiate for quad-based representation which would include statements from both but in distinctly named graphs, where the graph names would be the IRIs of corresponding resources.

I don't want to dive here into that specific handling of containers, I'm just using it as an example where quad-based representations (only for GET) could help statements from different resources cleanly separate when there is a need to combine them into a single response. PUT / PATCH would still need to use triple-based formats and target the resource in which the triples are modified.

I also keep my fingers crossed that Generalized RDF nuances will not be a roadblock to taking advantage of quads in some simple scenarios.

Sep 30 '22 22:09 elf-pavlik

specification specification copied to clipboard

Quad support in Solid

specification
specification copied to clipboard