web-access-control-spec icon indicating copy to clipboard operation
web-access-control-spec copied to clipboard

Timestamps and ACL caching

Open kjetilk opened this issue 6 years ago • 10 comments

I started thinking about the design of an ACL cache, and figured some metadata set on the authorizations, using DC properties could help with that.

This proposal helps both with caching individual authorizations, like a specialized reverse proxy or a Solid app could, as well as recommendations for Solid servers to implement so that legacy HTTP caches may cache ACL resources.

I think this should be in the WAC spec, so I submit it as a proposal for future consideration.

kjetilk avatar Feb 18 '19 01:02 kjetilk

wrt adding timestamp rdf properties, using them for HTTP cache headers: I agree, and is this unique to Authorizations? To me it seems like this advice is good for any type of thing coming out of a solid server. Curious if others agree.

Without blocking this particular PR, should this advice be 'lifted' into a more general part of the solid specs?

gobengo avatar Mar 28 '19 11:03 gobengo

wrt adding timestamp rdf properties, using them for HTTP cache headers: I agree, and is this unique to Authorizations? To me it seems like this advice is good for any type of thing coming out of a solid server. Curious if others agree.

Yeah, actually, I agree. :-) But, I note that for the WAC spec, we can specify how the RDF graph that specifies the authorization looks, but for other data, that is not so much the case...

Without blocking this particular PR, should this advice be 'lifted' into a more general part of the solid specs?

...so, in the interest of orthogonal specifications, I think this should be considered independently from other specs.

kjetilk avatar Mar 28 '19 22:03 kjetilk

I agree that any RDF source should produce an ETag header, so that clients can request it with an If-None-Match header, or use the similar less granular mechanism based on Last-Modified and if-Modified-Since.

I don't see why you would put those timestamps inside the data, though?

michielbdejong avatar Apr 25 '19 06:04 michielbdejong

@michielbdejong - It's worth noting that many systems do not properly track "Modified" dates for files, blurring lines with "Touched" and "Opened" (among other actions). Tracking modification datetime info explicitly within the data thus can have value.

That said, having worked with multiple systems that use such internal tracking, I can also say that relying on humans to (remember to) (accurately) do the work of changing those dates is similarly fraught with peril, so it would be good if increasingly intelligent technology could be brought to bear on it.

TallTed avatar Apr 25 '19 13:04 TallTed

@kjetilk I like this proposal. Especially the 'issued' and 'modified' attributes.

I'm a bit unsure about the 'valid' predicate, though. Is the intention to just use it for cache control? In which case, maybe an explicit cache control header from an http-headers ontology would be clearer?

If the intention is broader, I suspect this might be a bit confusing in terms of user interface and usability. What's the pain point that the 'valid' term is solving?

dmitrizagidulin avatar Apr 25 '19 21:04 dmitrizagidulin

many systems do not properly track "Modified" dates for files

That's irrelevant for the Solid spec, right? We can just warn against that in the spec, saying, beware if you implement your storage layer directly on a file system, looking at the mtime might not be good enough to implement proper ETags. The server should probably generate the ETag in code, and store it explicitly alongside the data?

I'm not (yet) convinced that the possible advantages of the changes suggested in this PR merit their cost.

In any case, and in a separate note, I think we should use a versioned spec, not a living document, so the 0.7 spec as it stands now will forever keep pointing at its snapshot versions of the various sub-specs, and at some point we need to do a triage round to establish which proposals would be eligible for going into the 0.8 spec (and I'm hoping we can postpone that until at least the end of 2018).

michielbdejong avatar Apr 26 '19 10:04 michielbdejong

I don't see why you would put those timestamps inside the data, though?

I tried to explain that in the spec itself, but I'll be happy to further clarify. There are a few reasons for this: It gives increased granularity, as you can cache individual authorizations, not just at a "ACL file level". It also provides orthogonality to the HTTP protocol, you don't need to rely on things being served over HTTP to use caching. I think this will be very important soon in an IoT world. Another aspect of this is that you don't need a separate layer of storage to manage these times, they are right there in the authorizations.

kjetilk avatar Apr 26 '19 19:04 kjetilk

many systems do not properly track "Modified" dates for files

That's irrelevant for the Solid spec, right? We can just warn against that in the spec, saying, beware if you implement your storage layer directly on a file system, looking at the mtime might not be good enough to implement proper ETags. The server should probably generate the ETag in code, and store it explicitly alongside the data?

I think we should keep mtime and ETag completely separate. They are two different things. Etags can be computed in many ways, and it is very important to get them right, but it is orthogonal to mtime.

So, there are two topics of importance here, one is caching generally, and one is an implementation detail of an ACL cache.

As I said above, caching generally should not have to rely on the "ACL file" as the smallest unit of caching, it should be possible for a specialized cache to consider each individual authorization as the smallest unit.

For the ACL cache that needs to be present for performance reasons in the actual authorization process, it is also as a matter of practicality, you don't want to look up the mtime on the backend if you can rely on that they are correct in your ACL cache, but you ACL cache should not consider your backend. The ACL cache should basically be a memory quad store that can be queried for authorizations really fast, and getting the mtimes from the ACL cache should also be a really fast operation.

So, either you store it with the authorization itself, or you store it in a separate resource, but I think that would be a bad design. To say that an authorization itself has been modified at a certain time is exactly what you should say, and this is saying it.

kjetilk avatar Apr 26 '19 19:04 kjetilk

@kjetilk I like this proposal. Especially the 'issued' and 'modified' attributes.

I'm a bit unsure about the 'valid' predicate, though. Is the intention to just use it for cache control? In which case, maybe an explicit cache control header from an http-headers ontology would be clearer?

If the intention is broader, I suspect this might be a bit confusing in terms of user interface and usability. What's the pain point that the 'valid' term is solving?

Yeah, well, I'm not sure the max-age has a good place in the HTTP headers ontology, but if it did, it would indeed be more precise.

So, I mostly chose the dct:valid predicate for consistency with the others, and possibly some unexpected reuse. You never know, what people might use it for if an authorization is said to be valid up to a certain time. :-)

Now, it was motivated from the observation I had that parsing a simple ACL file came at about 200 ms cost. That's quite a lot. We will be looking up ACL files for pretty much everything, it will be a situation where every millisecond counts when we get to the point that UX is based on the integration of a lot of resources. To not have to look up an ACL at all, but to use it without further ado from an ACL cache could mean a lot of milliseconds. :-)

It is certainly possible to use a different predicate for it, but I think it is a good fit myself.

kjetilk avatar Apr 26 '19 19:04 kjetilk

Needs a round of conflict resolution, @kjetilk

TallTed avatar Jul 31 '19 21:07 TallTed