server-configs-nginx
server-configs-nginx copied to clipboard
Fix GCP CDN caching assets with max-age=0
Problem
GCP's (Google Cloud Platform) Cloud CDN caches responses with Cache-Control: max-age=0
and serves it from stale cache. However the application's intent is not to have those responses cached at all.
Solution
To overcome this, we can remove forcing the max-age=0
header. This also gives the underlying application the ability to set the Cache-Control
header itself. The application might have endpoints that should not be cached (Cache-Control: no-cache, private
) or want to control depending on the response (endpoint?) for how long it should be cached (e.g. Cache-Control: max-age=60
).
Thanks a lot for opening this PR, @allantatter!
GCP's (Google Cloud Platform) Cloud CDN caches responses with
Cache-Control: max-age=0
Interesting! Could you provide any documentation regarding this behavior?
This also gives the underlying application
Unfortunately this is where the suggested solution is reaching this boilerplate boundaries. H5BP configs template targets static files first (i.e. no application behind), before application scenarii. I believe (I might be wrong) when an application is behind, the server is going to require additional configuration anyway, and then you are free to customize the values.
I hope my point is clear enough! 😅
Unfortunately I did not find the reason why GCP Cloud CDN behaves like that in their documentation:
- https://cloud.google.com/cdn/docs/serving-stale-content
- https://cloud.google.com/cdn/docs/caching#expiration
However, that would not be the only reason why to change Expires from 0 to off. I would think the reason, why currently H5BP set Exipres to 0 is distinguish the data responses (JSON, XML, etc) and not to put the long-living caching headers to those responses. But the side-effect of forcing the Expires to be 0 is that it also overrides the Cache-Control headers set by the underlying application itself (both Cache-Control: no-cache
and Cache-Control: max-age=60
would become Cache-Control: max-age=0
). I would think the current H5BP solution oversteps its own boundaries and should not mutate the dynamic data requests :) It should just ignore those.
I know that I can change the H5BP config with my custom configuration but I'd like to stick to the default setup as much as possible so it would be easier to update my projects when any new releases are done here.
Hey @Malvoz, are you around? I'm curious if you have an opinion on this 😊
Wow! Why is an absurd request like this being approved or even considered? I'm concerned for this project.
Wouldn't off
stop NGINX from sending headers at all basically letting the browser decide what to do? This would bring back the unexpected/inconsistent behaviour which 0
was meant to prevent.
Why couldn't OP use a configuration of their own? Wasn't this project a baseline rather than one that caters to one person's specific needs (questionable needs at that as the described behaviour apparently isn't even documented)?
What is next? Getting rid of cache headers altogether because someone doesn't want their JavaScript cached?!
@StrangePeanut What are you talking about? :D
Wouldn't off stop NGINX from sending headers at all basically letting the browser decide what to do?
Nope, off
means that it won't interfere cache-control
headers and lets the application decide if and how it should be cached.
Wasn't this project a baseline rather than one that caters to one person's specific needs (questionable needs at that as the described behaviour apparently isn't even documented)?
"behaviour apparently isn't even documented" - documented was not the symptom however the main issue is not related to GCP caching. And why do you think I want to solve only my needs? I would then have my custom logic within my own project. I made the PR to fix issues for everyone here.
What is next?
Get less emotional and let's talk again after that.
off means that it won't interfere cache-control headers
That might be if the app includes them, else it would be completely up to the browser. This is bad practice and the reason why it is currently set to 0
, ie to provide a decent baseline.
documented was not the symptom however the main issue is not related to GCP caching
Your initial comment says otherwise. Typically you would want NGINX to serve resources directly (eg cached) whenever possible. In which case letting NGINX provide the appropriate headers is expected if not, necessary.
I made the PR to fix issues for everyone here.
Your use case simply does not apply to everyone, and merging this PR will inevitably cause more issues than it solves.
I agree that breaking changes should not be merged hastily. Although bugs should be fixed. So it depends whether we think this is a bug or just one way it could behave (and my PR would just be another approach to it).
The reason why I thought it is a bug came from this behaviour a saw with these 2 scenarios:
If there is a scenario where it makes sense that by default NGINX always overrides Cache-Control
value to max-age=0
then I agree that my PR would not necessarily improve the default configuration but just change it. It would make a lot of sense that if no Cache-Control
header is present, then NGINX could introduce some default values. But can it do that conditionally so it would not override Cache-Control
if it's already set? Then it may be applicable for everyone.
Your initial comment says otherwise.
I know yep that the problem surfaced for me because of GCP platform and this was misleading. I afterwards noticed the main underlying pain points.
This seems to be configuring h5bp for your particular use case (and perhaps others). Not every particular configuration change should result in a PR to upstream. Perhaps a wiki entry would be more appropriate contribution?
@jchimene So you think it's the expected behaviour that you see on my graph? I don't think it's rare case that the backend frameworks add Cache-Control
headers into the response.
@allantatter It's not a rare case, but neither is it pathological. To me, it's appropriate for a particular installation to configure this behavior for their app's behavior. I would not look for such configuration to be pushed upstream. To reiterate: this is not a nominal configuration issue as this PR suggests. I'd prefer the default configuration to "fail safe".
Having NGINX serve resources directly in one way or another to offload the backend is very common today. Providing no Cache Control headers in such a scenario would reintroduce the unexpected/inconsistent behaviour which a value of 0
was meant to prevent.
While this could probably be addressed through the use of conditions, it goes beyond the scope of providing a baseline configuration. Where does one draw the line? I can already think of several similar conditions that could be added. At the end of the day, it would bloat the project for the sake of a small number of users. As far as I know, this repository never offered a set-and-forget configuration; it already comes with several supplementary configurations that are not enabled/included by default for this exact reason.
Wow! Why is an absurd request like this being approved or even considered? I'm concerned for this project.
Literally anyone can "approve" a PR on Github. Check the user account, it's a new user that has nothing to do with this project and no activity at all prior to that action.
Unless you see an approval from @LeoColomb it is meaningless...
Literally anyone can "approve" a PR on Github.
Not only this is true, but my comments above are quite far from an approval. So please @StrangePeanut, let's try to keep the discussion healthy and put rude words aside, as it's hurting everyone here: the reporter, the maintainers, and the contributors.
Now, on the concrete side of the discussion, and as initially answered, I share the concerns of changing the configuration to what this PR is suggesting. In short, based on everyone replies:
- The root goal of H5BP server configs is static files;
- This default configuration should stay “fail-safe”;
-
off
as default value can raise significant issues for perf and security;- Can be seen as a regression;
-
0
as default value might raise issues as well;- This where we reach Nginx "weakness" on Cache-control configuration, or at least if we want to keep the config files healthy for maximum coverage.
Are we all aligned on this?
I'd also add that I think backend frameworks that do a proper work on Cache-control header are quite rare, or at least quite modern. By this extends, I suppose the situation where power user can disable Nginx overwriting when they are confident about their backend is better than the opposite.
Right?
I apologise, I commented with the assumption that this PR had been approved by someone with write permissions. Only now do I realise that this was not the case. Sorry.
Since OP's original issue is that content gets cached undesirably if served with a cache-control value of max-age=0
, how about changing it to -1
instead? This introduces the following changes:
Before
cache-control: max-age=0
expires: <current date and time>
After
cache-control: no-cache
expires: Thu, 01 Jan 1970 00:00:01 GMT
-1
ensures that content is absolutely never to be cached which is what we want. Therefore the change from 0
to -1
could be considered a general improvement while potentially addressing OP's issue at the same time.
@StrangePeanut I like your proposed solution, at least it fixes the initial issue. And if I need to add dynamic Cache-Control
headers then I can do that by adding custom configurations. If everyone (or most) agree then I can make the changes in the PR.
perbaiki asset
This proposition looks good to me as well. I'd appreciate a third approval before moving forward. What do you think about the suggestion @jchimene? (@Dreamsorcerer?)
+1 It has the virtues of clarity and self-documentation.
After putting it to the test, it appears that a value of -1
affects the cache control header as expected but not the expires header which may lead to unexpected behaviour again.
We really should be using a value of epoch
to produce the desired results which is in line with the documentation, https://nginx.org/en/docs/http/ngx_http_headers_module.html#expires
Test Results
expires 0;
cache-control: max-age=0
expires: <current date and time>
expires -1;
cache-control: no-cache
expires: <current date and time>
expires epoch;
cache-control: no-cache
expires: Thu, 01 Jan 1970 00:00:01 GMT
expires epoch
could be a better option, agreed. Will make that change.
My thoughts:
GCP should probable have a bug filed against it. Reading around, the expected HTTP behaviour is that it should only serve a stale resource if it has lost connection to the upstream server. With a value of 0, it would always be stale.
However, the recommended approach to always serve the latest version is to use no-cache
, so this looks like a sensible change.
One thing missing here though, and I couldn't find anything when searching the project, is the use of private
, to indicate that things should not be cached by a proxy, but should be cached by the browser.
In my nginx.conf, I currently have this:
map $sent_http_content_type $cache_control {
default "private, must-revalidate";
application/atom+xml "public";
application/rss+xml "public";
text/css "public";
application/javascript "public";
text/x-component "public";
~audio/ "public";
~video/ "public";
~image/ "public";
~font/ "public, immutable";
}
add_header Cache-Control $cache_control;
Though with a little more reading, I've realised that it's not recommended to include public
in some of these cases, so that should probable be tweaked.
After putting it to the test, it appears that a value of
-1
affects the cache control header as expected but not the expires header which may lead to unexpected behaviour again.
Worth noting that the Expires header appears to be worthless, not sure why it's still being added these days. As far as I can see, Cache-Control max-age replaced Expires in HTTP/1.1, so I doubt there's anything worth supporting that still requires the Expires header.
@Dreamsorcerer Spot on. Your approach is superior.
oke
outl1ne:feature/fix-cache-expiration-for-gcp-cdn
@allantatter Could you please rebase? Or allow maintainer write permissions? I could not merge without a rebase.
@LeoColomb Done. However won't the test fail now as you changed the expected result of Cache-Control
from max-age=0
to not being there instead of no-cache
?
One thing missing here though, and I couldn't find anything when searching the project, is the use of
private
, to indicate that things should not be cached by a proxy, but should be cached by the browser.In my nginx.conf, I currently have this:
map $sent_http_content_type $cache_control { default "private, must-revalidate"; application/atom+xml "public"; application/rss+xml "public"; text/css "public"; application/javascript "public"; text/x-component "public"; ~audio/ "public"; ~video/ "public"; ~image/ "public"; ~font/ "public, immutable"; } add_header Cache-Control $cache_control;
Though with a little more reading, I've realised that it's not recommended to include
public
in some of these cases, so that should probable be tweaked.
@Dreamsorcerer Would very much appreciate a smarter content_transformation.conf, please create a PR for this as well. 👍🏻
Sorry, I've got no time, feel free to take that template and make a PR.
@emansom That is actually a really good idea. As replied by @Dreamsorcerer, don't hesitate to open that PR. I/we will help to make it acceptable if needed -- btw, thanks everyone for your inputs on this PR.