specification Implications of semaphore mechanism for SPARQL Update

Implications of semaphore mechanism for SPARQL Update

Open kjetilk opened this issue 2 years ago • 8 comments

While writing up a text around the semaphore mechanism from the RWW Design Issue, also discussed in #125 , https://github.com/solid/solid-spec/pull/193 , https://github.com/solid-archive/query-panel/issues/3 and others, I found that I have not yet fully understood the problem.

Is it also the idea that the same mechanism can be used in a situation where there isn't actual concurrency? I could see a case for that, say that one client GETs a certain resource, looks at it, and then wants to change something using a PATCH but meanwhile another client has changed the same? This is more the archetypal case for conditional requests, so I didn't think of it as a valid situation for the semaphore mechanism, but I guess a case for that this is actually a simpler mechanism could be made, as maintaining validators are more work for servers, and since it is a validator for the entire document, it scales badly if many people are co-editing a large document.

This issue has usually been discussed in terms of atomicity, including rdflib's documentation. I have not quite managed to understand that angle in light of the SPARQL Update spec, which says:

If any solution produces a triple containing an unbound variable or an illegal RDF construct, such as a literal in a subject or predicate position, then that triple is not included when processing the operation: INSERT will not instantiate new data in the output graph, and DELETE will not remove anything.

In our case, it means that if the WHERE clause doesn't match anything, the variable will be unbound and so nothing will be deleted. In other words, it sounds like we have standard behavior from the query language.

However, it is true that no error is raised by standard SPARQL, as a measure to not leak information. Thus, we must expect that standard SPARQL implementations will need possibly substantial modifications to accommodate for this case now.

To accommodate for this situation on the Solid protocol, I wonder if it would suffice to say something like

"If any solution produces a triple containing an unbound variable or an illegal RDF construct, then the server MUST abort any modifications and respond with a 409 status code."

That seems like a relatively minor change compared to standard SPARQL. This wouldn't be a violation of the language, AFAICS, it is a protocol thing.

I have only seen the semaphore mechanism mention in connection to the DELETE INSERT WHERE case, does that mean we only have to consider it in that context?

In the case where there is a pure delete operation, you don't actually care if someone has deleted the data before you, it is gone, and then all is fine, right?

It seems to me like the DELETE DATA/INSERT DATA case is a bigger problem than the DELETE INSERT WHERE case, because in SPARQL, they are considered separate operations, even if executed in a single HTTP request. This would lead to that both INSERT DATAs are committed, isolation or not.

The easy way out of that is that developers should always use DELETE INSERT WHERE if they need semaphore behavior. That would, as far as I can see, be compatible with NSS.

The alternative is to say that if DELETE DATA deletes nothing, then the server must return 409, which is an actual spec violation. It may not be very different practically, I suppose, but still a spec violation.

Oct 14 '21 22:10 kjetilk

https://www.w3.org/TR/sparql11-update/#updateLanguage

If multiple operations are present in a single request, then a result of failure from any operation MUST abort the sequence of operations, causing the subsequent operations to be ignored.

My interpretation is that given requested DELETE/INSERT operation, and DELETE fails, then the rest (INSERT) should be ignored. I don't see INSERT failing (feasible?) or preceding DELETE.

Oct 18 '21 08:10 csarven

That's actually a different thing. A DELETE INSERT WHERE query is a single operation. You could also formulate a sequence of operations which would be similar.

DELETE WHERE { ?foo a <Bar> } ;
INSERT { ?foo a <Baz> } WHERE { ?foo a <Bar> }

are two operations in a single request, whereas

DELETE { ?foo a <Bar> } 
INSERT { ?foo a <Baz> } WHERE { ?foo a <Bar> }

is a single operation.

I asked in https://github.com/solid/specification/issues/125#issuecomment-944345692 whether we should support several operations in one request, and @RubenVerborgh 's response was "not yet". Indeed, it is possible to design something around this, but the key problem in our case is that the result of a DELETE that doesn't delete anything in SPARQL is not a failure, whereas it is in Solid.

Oct 18 '21 08:10 kjetilk

I found that I have not yet fully understood the problem.

Here is a typical case that people had in mind for this with Databrowser.

My color of a meeting participation record is blue
I want to change that color to red
So hence a PATCH request is created to delete the blue and add to red.
However, if there were zero matches for the blue, then that means the color was changed already. So abort.
If there was more than one match for the color, then the state is broken, so abort.

The actual code for this can be traced back all the way to rdflib.js, as can be seen here: https://github.com/linkeddata/rdflib.js/pull/298/files

I asked in #125 (comment) whether we should support several operations in one request, and @RubenVerborgh 's response was "not yet".

Yes; unfortunately I have now seen that Databrowser and NSS interact as follows:

DELETE DATA {
  <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Kjetil" .
 }
 ; INSERT DATA { <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Ruben" .
 }

So note how this is done with two queries, without a WHERE clause, and without matching variables.

And unfortunately, it will follow the same logic: if the deletion fails because there are no or multiple matches, then neither the insertion nor the deletion happen.

Oct 18 '21 12:10 RubenVerborgh

Right, I suspected as much for the DELETE DATA INSERT DATA case, but I can't think of any way to reconcile the multiple match case with SPARQL. I need to think further about that.

Oct 18 '21 14:10 kjetilk

So, one possibility is that we do not allow full SPARQL for 0.9... We only allow the constrained subset and with that mechanism... I don't like that option myself, but what do others think?

Oct 18 '21 14:10 kjetilk

-1 on the mechanism (without any special content type or profile)

Oct 18 '21 15:10 RubenVerborgh

hmmm, yeah, but actually, perhaps we should have a content type for it... With something that incompatible with application/sparql-update we'd need that anyway.

Oct 18 '21 21:10 kjetilk

I might also add that exactly that kind of query

DELETE DATA {
  <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Kjetil" .
 }
 ; INSERT DATA { <https://ruben2021.solidcommunity.net/profile/card#me> <http://www.w3.org/2006/vcard/ns#fn> "Ruben" .
 }

was the reason I originally proposed to have just DELETE DATA; INSERT DATA as the subset. If you don't have any variables, it is an awful lot easier to have just one match :-) If all those queries that require only one match could be rewritten as this one, that would also make life much easier.

Oct 20 '21 12:10 kjetilk

@kjetilk I might be on board with that; just what do we do with blank nodes? (hah)

Oct 20 '21 12:10 RubenVerborgh

@kjetilk I might be on board with that; just what do we do with blank nodes? (hah)

I think it should be straightforward to follow the SPARQL Update spec here (for DELETE DATA; INSERT DATA):

the INSERT DATA statement only allows to insert ground triples. Blank nodes in QuadDatas are assumed to be disjoint from the blank nodes in the Graph Store, i.e., will be inserted with "fresh" blank nodes.

in a DELETE DATA operation neither variables nor blank nodes are allowed

Oct 20 '21 13:10 rubensworks

I have now made an alternative PR in #330, where I try out what @timbl has been voicing, i.e. REMOVE and REMOVE DATA.

I don't quite feel that we have exhaustively answered the question:

"When using the semaphore mechanism, would the triples that you want to change be known before the PATCH request?"

I bet there are desirable cases where that's not quite true, but then, are they significant enough?

My feeling around this is:

The case for REMOVE DATA ; INSERT DATA is pretty strong, it is an atomic update operation with a 409 conflict resolution. It can be hacked in an afternoon, and wouldn't need a query engine at all. I could see this entering the SPARQL spec.

The case for REMOVE INSERT WHERE in the case where there are unbound solutions causing a failure in SPARQL Update terms is also quite good. You'd need a SPARQL engine pretty quickly, but it isn't too hard to modify. I think it would be harder to argue to a SPARQL WG, and I would personally prefer a broader approach, but this can be done.

What I really struggle with is the idea that there should be a failure when the query has multiple solutions. Like, yes, it might be something wrong, but it also might not be, it might be very legitimate reasons why there are multiple solutions. And if there really is a problem, this conflict resolution mechanism would be the wrong way to do that, as there are many other reasons why that might have been introduced than an edit conflict. That would be more in the realms of shape validation and so on. It also departs very much from my understanding of what SPARQL is.

I have nevertheless included that language in #330 , but I'd rather leave it out.

Oct 21 '21 22:10 kjetilk

specification specification copied to clipboard

Implications of semaphore mechanism for SPARQL Update

specification
specification copied to clipboard