specification icon indicating copy to clipboard operation
specification copied to clipboard

Level of SPARQL Update support

Open kjetilk opened this issue 4 years ago • 43 comments

A certain level of SPARQL Update support is expected in Solid, to be used with the PATCH method (#85). This discussion has begun, and this issue is to discuss some details that we need to decide upon.

The main questions are:

  1. What subset of SPARQL Update is suited as a minimal requirement for Solid?
  2. How does WAC apply to the minimal subset?
  3. How does WAC apply to SPARQL Update for implementations that will use a fully compliant SPARQL implementation?
  4. What would be the URI of SPARQL Endpoint(s)?
  5. Should any SPARQL Update operations be forbidden?
  6. Should complex HTTP Verbs rather be SPARQL operations?

A short overview of SPARQL Update

SPARQL Update has three operations that are of relevance to us, INSERT DATA, DELETE DATA, and INSERT/DELETE. SPARQL always operates over quads and quad patterns, whether they are quads that are passed directly as data to the two former operations, or used with the keywords WITH and USING to the INSERT/DELETE operation. In the context of Solid, each resource is represented with triples, and since the graph part is optional, we can safely ignore it for now.

INSERT DATA takes triples in their curly brackets, and RDF merges the triples into the resource. DELETE DATA deletes exactly the triples that it has in curly brackets if they exist. It is important to note that there are no variables with these two operations, so no pattern matching is going on. They are the simplest forms of SPARQL Update.

If pattern matching is required, i.e. you need variables, then those variables goes in a WHERE clause, and thus, the more advanced INSERT/DELETE operation must be used.

Minimal SPARQL Update requirement

Quite clearly, the two operations INSERT DATA and DELETE DATA has some interesting properties, as supporting them does not require a query engine, it only requires that the RDF library can parse the queries, which are trivial since it is just triples, no patterns, and that it can perform an RDF Merge operation, and delete triples. The DELETE DATA operation can't contained blank nodes, which also simplify. Moreover, since both operations can be performed in a single HTTP request, it can be implemented as an atomic operation with relative ease.

Once a WHERE clause is added, for the more complex INSERT/DELETE operation, pretty much a full SPARQL engine with an almost complete larger parser and query planner is required.

Thus, a requirement to support INSERT DATA and DELETE DATA in a single HTTP PATCH request seems like an attractive option.

WAC as applied to Minimal SPARQL Update

INSERT DATA seems clearly an acl:Append operation, and DELETE DATA is clearly a acl:Write operation.

The question is if acl:Read should also be required. Imagine a malicious user "Mallory": Mallory is authorized to write, but not to read, and does not particularly care if he destroys things, he just wants to check if certain triples were there. In that case, he can send the query

DELETE DATA {
  <alice/profile#me> ex:age 14 . 
}

The fear now would be that Mallory can figure out from the response that Alice was in fact 14 years old. With SPARQL as defined, this will have no effect, so it shouldn't be a problem. However, we have challenged this behaviour, so this may be a problem with Solid, that may be solved by requiring acl:Read to be able to perform a DELETE DATA operation.

The risk may be so remote that it isn't a real concern, but I think we need to discuss it.

WAC applied to SPARQL as a whole

Some implementations may have a full SPARQL Engine available and will wish to use it. For them, we need to define how WAC applies. As above, INSERT is clearly an acl:Append operation, DELETE is clearly an acl:Write operation, but with the caveat above, it may also be an acl:Read operation. Whenever the WHERE clause is added, acl:Read would also be required. There is a long-term possibility that data could participate in the query without being exposed to the user, but lets only be concerned with the permission modes we currently have for now. Then, obviously, all the SPARQL read queries require acl:Read.

SPARQL Endpoint

Historically, SPARQL has been queried through a server-wide SPARQL Endpoint, but the PATCH use case typically makes every resource its own endpoint, and will only query data from that resource. This is a useful simplification, because it removes the need to use graph naming. This assumption may be relaxed in the future, but for now, I suggest we keep it that way.

Other SPARQL Update operations

SPARQL Update also defines operations LOAD, CLEAR, CREATE, DROP, COPY, MOVE and ADD. We might need a brief note on what to do with them.

COPY and MOVE operations

The COPY use case has been proposed in #19 , and a possible solution could be to use the SPARQL Update COPY operation instead of a protocol verb. Similar with MOVE.

Forbidden SPARQL Update operations?

Most of the other operations maps trivially to HTTP methods as defined in Solid through LDP. It may be problematic to support them, as WAC must be applied in a consistent manner, and failure to do so may cause leaks. OTOH, those who have a full SPARQL engine may find it bothersome if they cannot use them. We need to define the behaviour.

kjetilk avatar Nov 26 '19 22:11 kjetilk

Couple of quick points from my side:

  • What would be the URI of SPARQL Endpoint(s)?

None.

As it stands, there is no notion of a SPARQL endpoint, in the sense of the SPARQL procotol (which would use GET or POST).

Rather, we are using the patch format with its MIME type application/sparql-update as one (mandatory?) accepted patch document of a PATCH operation.

  • What subset of SPARQL Update is suited as a minimal requirement for Solid?

Additional question: And what should happen when clients go outside of that subset?

  • Should complex HTTP Verbs rather be SPARQL operations?

No, not by default as the minimal interface, given that:

  • We do not use the SPARQL protocol, but rather the SPARQL UPDATE syntax and semantics.
  • Other patch documents such as Notation3 patches exist (support to be decided); SPARQL does not have a special relationship (other than that its support for patch documents might be mandatory).

Other question:

  • What semaphore semantics do we want? The current Solid draft spec deviates from the SPARQL UPDATE standard, which is—in my opinion—highly undesired.
    • My suggestion there would be to follow the SPARQL standard by default, but allow different behaviors, either through Link headers from the client`, or by using a different patch body altogether (such as Notation3), for the semantics are still ours to define.

RubenVerborgh avatar Nov 26 '19 22:11 RubenVerborgh

As it stands, there is no notion of a SPARQL endpoint, in the sense of the SPARQL procotol (which would use GET or POST).

Right, a flaw in my mental model. Thanks for pointing that out.

  • What subset of SPARQL Update is suited as a minimal requirement for Solid?

Additional question: And what should happen when clients go outside of that subset?

:+1:

  • Should complex HTTP Verbs rather be SPARQL operations?

No, not by default as the minimal interface, given that:

* We do not use the SPARQL protocol, but rather the SPARQL UPDATE syntax and semantics.

Ah, but I think you misunderstood my point there. I'm not talking about HTTP verbs in relation to SPARQL Protocol, I'm talking about them in relation to Solid, like in the proposal to introduce HTTP Verb COPY from WebDAV in #19 . Another implementation option there might be to use the SPARQL Update syntax and semantics, not the WebDAV one.

* What semaphore semantics do we want? The current Solid draft spec deviates from the SPARQL UPDATE standard, which is—in my opinion—highly undesired.
  
  * My suggestion there would be to follow the SPARQL standard by default, but allow different behaviors, either through `Link` headers from the client`, or by using a different patch body altogether (such as Notation3), for the semantics are still ours to define.

Yeah, it is a pain. I would like to add some more sophistication in SPARQL at this point, but it would take quite an effort to argue for that, I think.

Meanwhile, I would like to see the queries that are used, especially if the DELETE/INSERT/WHERE can be dropped in favour of DELETE DATA ; INSERT DATA.

kjetilk avatar Dec 19 '19 00:12 kjetilk

Since one of the most urgent decisions that we need from this is the minimal SPARQL Update requirement, I started to look into what could inform this decision. The TL;DR is: "Is it sufficient for a Solid server to support DELETE DATA and INSERT DATA query forms?"

I'd like to hear the input of @RubenVerborgh and @rubensworks , as it can be informed by the LDFlex work.

I also looked into rdflib, and found that it seems to look to see if a statement has a blank node, and therefore interpretes that as a quad pattern, and so uses a WHERE clause: https://github.com/linkeddata/rdflib.js/blob/master/src/update-manager.ts#L776-L797 The key to understand the requirement is therefore to see to what extent blank nodes are used in updates using rdflib.

kjetilk avatar Jan 14 '20 16:01 kjetilk

Currently, LDflex can also produce WHERE clauses for insertions and deletions. Several examples can be seen in the unit tests.

I do however think that it may be possible to disallow WHERE clauses, require the client to perform a query beforehand, and fill in all the triples that need to be mutated directly. In some cases, this could cause a blowup in the number of triples though, but this may be manageable in the context of solid.

rubensworks avatar Jan 15 '20 07:01 rubensworks

The TL;DR is: "Is it sufficient for a Solid server to support DELETE DATA and INSERT DATA query forms?"

I don't think so; the semaphore functionality is important to many Solid apps. See https://github.com/solid/specification/issues/139

RubenVerborgh avatar Jan 15 '20 09:01 RubenVerborgh

But isn't that orthogonal to the semaphore issue?

I just saw you restarted discussion in https://github.com/solid/solid-spec/pull/193 , I'll go over there.

kjetilk avatar Jan 15 '20 09:01 kjetilk

But isn't that orthogonal to the semaphore issue?

The current semaphore mechanism relies on INSERT … WHERE, in which the WHERE clause ensures the existence of one thing before writing another. The less related part is whether the semaphore should also work if there is more than one match to the WHERE clause (spec says yes, Tim says no).

RubenVerborgh avatar Jan 15 '20 10:01 RubenVerborgh

I have made a loose proposal to the SPARQL 1.2 CG mailing list, which I think would address the semaphore problem as well as the confidentiality problem: https://lists.w3.org/Archives/Public/public-sparql-12/2020Jan/0000.html

I suggest that further discussion is held in a query-panel repository (https://github.com/solid/process/issues/186) or in the SPARQL 1.2 CG as appropriate.

kjetilk avatar Jan 16 '20 12:01 kjetilk

Note: we might (or might not) want to move issues such as this one over there.

RubenVerborgh avatar Jan 16 '20 13:01 RubenVerborgh

Yeah, actually, my idea, which is codified in https://github.com/solid/process/pull/182 is that this is exactly the kind of overarching issue that should live in the spec repo for the editors to track, and for the panel to report progress on, to move it along the editors project board, but the panel will create issues like "what permissions are required for different operations" will be opened in the panel repo board, and each of them isn't the editors task to track.

kjetilk avatar Jan 16 '20 14:01 kjetilk

A new Query Panel has been formed, and the issues from here have been detailed as individual issues there. There's also a gitter channel. Further detailed discussion should happen there.

This will now serve as the birds-eye view issue that serves as a contact point between the Query Panel and the Editors.

kjetilk avatar Jan 17 '20 10:01 kjetilk

Status right now is that the Solid Editors have prioritized this issue and we've put it in the consensus phase. Also, limited support for the WHERE clause is required, so I'll look into how we can constrain that. We don't have the query panel now, so the discussion will happen here.

kjetilk avatar Jun 24 '21 15:06 kjetilk

I feel like there is an agreement that at least INSERT/DELETE DATA is essential.

Implementation-wise, INSERT/DELETE WHERE with just BGPs in the WHERE clause should be fairly simple to support. More expressive clauses such as FILTER, UNION, OPTIONAL, ... are a lot more complex.

So one solution could be to say that:

  • Servers MUST support INSERT/DELETE DATA.
  • Servers MUST support INSERT/DELETE WHERE with just BGPs in the WHERE clause.
  • Servers MAY support the full SPARQL 1.1 Update syntax (also includes LOAD, DROP, ...).

For this last point, some kind of announcement mechanism may be desired, so that clients can discover this functionality and exploit it.

rubensworks avatar Jun 24 '21 15:06 rubensworks

Yes, that is basically my plan. I have to look into how we can define it rigorously.

kjetilk avatar Jun 24 '21 20:06 kjetilk

Also see the discussion at https://github.com/solid/community-server/pull/786#discussion_r658847207, where we wonder about how a server can advertise their support of optional parts.

RubenVerborgh avatar Jun 25 '21 15:06 RubenVerborgh

  • Servers MUST support INSERT/DELETE WHERE with just BGPs in the WHERE clause

This seems like a reasonable minimum.

  • Implementer feedback: Fairly simple to support
  • Developer benefit: Can avoid running updates locally and pushing results in a respectable set of use cases ✅

For this last point, some kind of announcement mechanism may be desired, so that clients can discover this functionality and exploit it.

Agree this would be valuable 👍 This pattern could possibly be leveraged beyond query, letting the server advertise other types and/or levels of support.

justinwb avatar Jun 25 '21 18:06 justinwb

Minimal subset

I think we already understand that for SPARQL Update for the HTTP PATCH method, we allow INSERT DATA, DELETE DATA and the INSERT ... WHERE queries.

I had written a lengthy dissertation about how we should define this in terms of the SPARQL Grammar, and how that was made difficult by the SPARQL spec, and then I came to remember that @ericprud had already done that.

I think we should adopt @ericprud 's proposal and extend that with INSERT DATA and DELETE DATA. ericprud suggests a different media type for it, I don't think that's necessary, but it could also be a hook for the semaphore mechanism.

SPARQL Dataset considerations

Clearly, the graph that will be modified is the target resource of the PATCH, and no other. For the minimal subset, I think the consequence of that is pretty clear: We remove anything that has to do with quads from the minimal subset, the minimal subset is triples and triple patterns only, in line with @ericprud 's proposal. Agreed?

However, for full SPARQL Update there is a further concern as it provides a way to modify a different resource than the target resource. For example, it can be done using the WITH keyword or the GRAPH keyword in a Quad Pattern. What should we do about that?

I think we should ban the WITH keyword altogether, whether or not a full SPARQL Update is used or just the subset. IMHO, using the WITH keyword is something for when SPARQL is used with the SPARQL Protocol, not when it is used as the body of a PATCH.

If the WITH keyword is allowed, then we need to have security considerations that the client is authorized to perform the rest of the query. It doesn't seem to be worth the effort, given that it also addresses a different resource than the target.

There is a similar concern for the USING keyword and for using GRAPH in the WHERE clause, but as these are read operations, it is easier to handle and it is philosophically easier to draw on data outside of the target resource in a WHERE clause than to modify it.

It also applies to LOAD, CLEAR, etc, as they too name graphs.

As a matter of principle, should we state that the PATCH method can only modify the target resource? (container may be updated, of course)

Also note that SPARQL1.1 mentions the use of PATCH and suggest 422 as the response when query tries to modify a different resource.

Access Control

For INSERT clearly Append, for DELETE we need to resolve #220, but I think it is pretty clear now that DELETE requires Read and Write within current Solid. It does also not appear to be limited to the use of the WHERE clause unless we limit the semaphore to only apply to that.

We also need to figure out the access controls for the other SPARQL Update operations. I believe we should solve this by generic language in the protocol that we can refer to.

Language discovery

I think the idea of having a profile to Accept-Patch is an interesting idea, but I'll note that SPARQL 1.1 has a Service Description. It was primarily intended for endpoints, so it isn't that well geared for our use case, but it may be worth to see if we can reuse things from there. We could define a subclass to sd:Language to use somehow.

Response when outside the subset

Here I'm just interesting in hearing what CSS does when a query is outside of the implemented subset. If we do a separate media type for it, then a 415 is an appropriate response, otherwise perhaps a 422.

Going forward

I would have preferred a query panel at this point, but since that was closed, I think we should indicate consensus on the various parts here, so that we can open issues for drafting later.

kjetilk avatar Jul 01 '21 10:07 kjetilk

We remove anything that has to do with quads from the minimal subset, the minimal subset is triples and triple patterns only, in line with @ericprud 's proposal. Agreed?

One downside of a triple-based subset is that it will be impossible to modify resources containing multiple graphs, which is possible with TriG, N-Quads, and JSON-LD documents. While a triple-based subset is probably sufficient for the majority of use cases, this restriction may cut off many important use cases. Several domains explicitly require the use of named graphs for their functionality, such as Nanopublications and RSP-QL, both of which may benefit from Solid.

Therefore, I would suggest to allow named graphs explicitly. (AFAIK, CSS allows this)

As a matter of principle, should we state that the PATCH method can only modify the target resource? (container may be updated, of course)

:+1:

I would have preferred a query panel at this point, but since that was closed, I think we should indicate consensus on the various parts here, so that we can open issues for drafting later.

Would it make sense to have a temporary short-lived panel for this? I think this issue is significant enough to have some calls about.

In any case, I agree on the separate parts for issues.

rubensworks avatar Jul 01 '21 12:07 rubensworks

Therefore, I would suggest to allow named graphs explicitly.

Also note that this is part of a larger discussion of whether Solid will be RDF 1.0, 1.1, RDF-star, N3. (Not having that discussion here though.)

RubenVerborgh avatar Jul 01 '21 12:07 RubenVerborgh

Therefore, I would suggest to allow named graphs explicitly. (AFAIK, CSS allows this)

Currently it doesn't, only BGPs in the default graph are accepted. But this is an artificial limitation to be in line what was discussed at that point. It would be easy to support multiple graphs, assuming those quads are contained within a single resource. Using graphs to patch multiple different resources simultaneously would be a bigger issue.

joachimvh avatar Jul 01 '21 12:07 joachimvh

Yeah, in the future, we might want full quad semantics for every resource, but we are quite far from that currently. That is, we kind of have quad semantics, it is just that the target resource identifies the default graph for all operations.

I don't think we can change that now, to the contrary, in the discussion above, I propose to ban the explicit graph...

kjetilk avatar Jul 01 '21 13:07 kjetilk

I just quickly skimmed the Nanopub examples and the RSP-QL paper. My opinion is that the only constraint imposed by Solid on a fundamental level is that the graph has to be an information resource (because that is a Solid Resource), but it seems that should not be a significant obstacle for Nanopub. Their recommendation on a quad-based serialization is a bigger obstacle. For RSP-QL, which seems more generic, it could be a more significant problem, but then again, it is likely to be more practical issues.

I think we could very well say in a best-practices document that you should identify the resource with the graph if you desire quad semantics, but relaxing the constraint that the graph must be an information resource is something I think is better left for later, we shouldn't attempt to tackle that now.

I hope (more) people can indicate whether they think that the key topics here can be agreed on:

  1. subset definition is OK (i.e. @ericprud 's proposal + INSERT DATA and DELETE DATA with triple semantics).
  2. That PATCH must only modify the target resource, with the implications that have on the use of graphs for full SPARQL Update implementations.
  3. #220

And then, I'd love to hear more about your experiences regarding discovery and response when the query is outside of the subset.

BTW, I'm going on vacation for a while now, so I will probably not be responding for a while. :-)

kjetilk avatar Jul 02 '21 14:07 kjetilk

but relaxing the constraint that the graph must be an information resource is something I think is better left for later, we shouldn't attempt to tackle that now.

On the one hand, I agree this makes sense for reducing complexity at this stage.

On the other hand, I noticed in several research-based cases that a pure triple-based approach was chosen first, with the assumption that quads could be added in the future. But in reality, quad support was never added, or only added afterwards in an inconvenient manner.

So we should just be careful that quad support is not seen as just an afterthought.

(Perhaps we should move this discussion to a separate issue? Perhaps one even exists already?)

rubensworks avatar Jul 02 '21 15:07 rubensworks

Yeah, please open a separate issue about it, I am not aware of any. :-)

kjetilk avatar Jul 02 '21 16:07 kjetilk

Pardon a brief interruption from a clueless implementer. :-) How does the following align with your intentions for the spec?

Expected returns from Patch Update
 * 200, insert to existing resource
 * 200, insert to non-existing resource creates resource
 * 200, delete existing triple from existing resource
 * 200, delete+where on existing triple from existing resource
 * 200, insert+delete on existing resource and existing triples
 * 200, insert+delete+where on existing resource and existing triples
 * 400, patch document syntax error
 * 400, patch does not contain an insert or a delete
 * 404, delete attempted on non-exisiting resource
 * 409, delete attempted on non-exisiting triple
 * 409, patch could not be applied
 * 409, attempted patch on non-RDF resource
 * 409, attempted patch on Container
 * 415, unsupported patch content-type

[Edit : accept delete + where]

jeff-zucker avatar Jul 05 '21 16:07 jeff-zucker

I'm leaving 401 and 403 off for now, as it sounds like there are undecided issues about relationship of access modes to patch.

jeff-zucker avatar Jul 05 '21 17:07 jeff-zucker

Hi @jeff-zucker , sorry for the long ping times, there has been vacation time. :-) But yeah, those status codes sound just about right. Nothing wrong that springs to mind, at least. I hope to advance this soon, to better support you implementers. :-)

kjetilk avatar Jul 27 '21 18:07 kjetilk

The Solid Editors Meeting today, with @timbl , @csarven and @kjetilk present resolved:

  1. We adopt @ericprud 's proposed subset, but add INSERT DATA and DELETE DATA.
  2. That PATCH must only modify the target resource, which implies that full SPARQL Update implementations cannot use the graph features to modify other graphs.
  3. We acknowledge that the graph features of SPARQL are useful, but encourage implementers to use the SPARQL Protocol for full SPARQL requirements.

We didn't come to consensus on discovery, access control and #220 . The latter can be decided separately, access control is something we can decide in drafting phase and I would like to hear more experience on discovery. Thus, I advance this issue to rough consensus.

kjetilk avatar Jul 27 '21 18:07 kjetilk

The Solid Editors Meeting today, with @timbl , @csarven and @kjetilk present resolved:

  1. We adopt @ericprud 's proposed subset, but add INSERT DATA and DELETE DATA.

I just touched that proposal page to fix the link to an HTML-ized RFC.

Feel free to edit that proposal and the associated yacker. If you want to keep the orig grammar around for posterity, you can save the edited grammar under a new name. One way to decide whether to duplicate is whether someone somewhere would want to instantiate that proposal without INSERT DATA and DELETE DATA. I kinda doubt it but leave it to your discretion.

ericprud avatar Aug 03 '21 07:08 ericprud

Because it's important to keep things clear, ad wiki pages should always be considered moving targets, and future wiki edits might change the SPARQL UPDATE subset on that page...

Subset page as resolved for adoption -- https://www.w3.org/2001/sw/wiki/index.php?title=SparqlPatch&oldid=4800

Today's tweaked page (which did not change the SPARQL UPDATE subset) -- https://www.w3.org/2001/sw/wiki/index.php?title=SparqlPatch&oldid=5335

TallTed avatar Aug 03 '21 14:08 TallTed