cachecontrol icon indicating copy to clipboard operation
cachecontrol copied to clipboard

Proper `Cache-Control: private` support

Open kyrias opened this issue 8 years ago • 7 comments

Responses where Cache-Control is set to private may be cached but must not be cached in a shared cache. I have a use of cachecontrol for an API client I'm writing, but it has support for multiple users, so I need to implement user-specific caching support soon.

My idea for how to implement it is to have a parameter you pass to CacheControl with a header name. Then when a response has Cache-Control: private set it would try to fetch a header with that key from the request/response and then then prepend a hash of that to the cache key.

I currently have it implemented locally for the redis backend, though it's currently very implementation specific, only checking for Authorization, and only for the redis backend.

Any thoughts/opinion on this idea?

kyrias avatar Feb 22 '17 22:02 kyrias

That is a tough one! I'm curious about what qualifies as "shared". For example, if I were using the file cache, would using a different directory be enough to be considered private? I suppose the same goes for the redis cache where private and public items might be cached.

As for an implementation, I think what I would do is define something like a namespace function that is passed to the CacheControl wrapper. If defined, the function would return a prefix key that would be prepended when storing it in the cache.

That sounds pretty similar to what your doing with the exception being that a function is passed in that allows accessing anything in the request to be used when constructing the private namespace.

I hope that helps a bit! It sounds like an interesting problem and I'm interested to see how you solve it!

ionrock avatar Feb 22 '17 22:02 ionrock

Essentially it shouldn't be shared as in it shouldn't be possible for the result to be returned for other users.

I do like the idea of a namespace function though, hm. I'll try it out, thanks!

kyrias avatar Feb 23 '17 12:02 kyrias

This issue just dinged me pretty hard.

Looking at the standard for this:

https://tools.ietf.org/html/rfc7234#page-21

   is intended for a single user and MUST NOT be stored by a shared
   cache.  A private cache MAY store the response and reuse it for later
   requests, even if the response would normally be non-cacheable.

This is something cachecontrol should not be doing at all, unless specifically enabled in a heuristic which you already have sufficient support for.

I was going mental trying to figure out how something with Cache-Control: private was getting cached and then I ripped apart the cachecontrol code proving it.

Pretty please don't break standards like that :(

jowrjowr avatar Aug 04 '17 20:08 jowrjowr

@jowrjowr My understanding, which I'll admit might be wrong, would be a caching proxy where multiple users are sharing static resources in the same cache. In a client, it gets murky. For example, if I have an API client and I want a shared directory, the use the content might be exactly the same, in which case I'd argue that the cache is not "shared" from the perspective that, even though different processes are using the cache, there is still a single "user" utilizing the cached values.

A good example is the use case that drove me to write CacheControl in the first place. I had a REST based gettext replacement and I wanted to have a shared cache between processes on the same machine. Each program that instantiated the gettext functionality would get a client configured with CacheControl using a local bsddb database. This allowed a client in one process to cache the value for another process which might need a cached value. From this perspective, I'm considering the cache "private" as it only pertains to a single set of users that require the same use case.

Maybe this is the wrong way to think about it! Please let me know your use case and hopefully we can find a better solution.

ionrock avatar Aug 04 '17 21:08 ionrock

The use case is a multi-user (1000+) user environment which really appreciates caching of commonly fetched content. The URLs are always (except the case that triggered this) distinct so redis fills up with cache objects and everyone is happy. This all operates on the back end, but shared between users.

The problem came up with what was an oauth token verification URL.

The example would be https://site.com/verify with a specific header with the Authorization bearer token. The response from the site has Cache-Control: private, which according to spec, ought to not cache in general. I acknowledge there are cases where you might want to do that but I would not consider that to be the default.

Thankfully there's a different way to use that URL with an in-url parameter, but I really didn't expect this to be an issue at all.

My usecase is going to be common to anyone who works with 3rd party APIs.

On Fri, Aug 4, 2017 at 4:20 PM, Eric Larson [email protected] wrote:

@jowrjowr https://github.com/jowrjowr My understanding, which I'll admit might be wrong, would be a caching proxy where multiple users are sharing static resources in the same cache. In a client, it gets murky. For example, if I have an API client and I want a shared directory, the use the content might be exactly the same, in which case I'd argue that the cache is not "shared" from the perspective that, even though different processes are using the cache, there is still a single "user" utilizing the cached values.

A good example is the use case that drove me to write CacheControl in the first place. I had a REST based gettext replacement and I wanted to have a shared cache between processes on the same machine. Each program that instantiated the gettext functionality would get a client configured with CacheControl using a local bsddb database. This allowed a client in one process to cache the value for another process which might need a cached value. From this perspective, I'm considering the cache "private" as it only pertains to a single set of users that require the same use case.

Maybe this is the wrong way to think about it! Please let me know your use case and hopefully we can find a better solution.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionrock/cachecontrol/issues/141#issuecomment-320355777, or mute the thread https://github.com/notifications/unsubscribe-auth/AG-inOqx-kSUxY_qEcggdFBpD4HTb0Cxks5sU4sHgaJpZM4MJQlM .

jowrjowr avatar Aug 04 '17 22:08 jowrjowr

@jowrjowr Ah OK, that does make sense why that would be difficult. I'm reluctant to simply stop caching when private is declared, but I do think it would make sense to define a cache as "shared" or not. If the cache is declared as shared, then the private directive will be honored as the spec prescribes.

That would require some code changes:

cache = RedisCache(redis_url, shared=True)
sess = CacheControl(requests.Session(), cache=cache)

That seems pretty reasonable to me and reflects more closely your point about the spec.

ionrock avatar Aug 07 '17 01:08 ionrock

Nice! That's pretty reasonable.

On Sun, Aug 6, 2017 at 8:03 PM, Eric Larson [email protected] wrote:

@jowrjowr https://github.com/jowrjowr Ah OK, that does make sense why that would be difficult. I'm reluctant to simply stop caching when private is declared, but I do think it would make sense to define a cache as "shared" or not. If the cache is declared as shared, then the private directive will be honored as the spec prescribes.

That would require some code changes:

cache = RedisCache(redis_url, shared=True) sess = CacheControl(requests.Session(), cache=cache)

That seems pretty reasonable to me and reflects more closely your point about the spec.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ionrock/cachecontrol/issues/141#issuecomment-320545189, or mute the thread https://github.com/notifications/unsubscribe-auth/AG-inDYJHlefcAmRRlZ1JhQVRMmknJ1Iks5sVmJugaJpZM4MJQlM .

jowrjowr avatar Aug 07 '17 01:08 jowrjowr