readthedocs.org
readthedocs.org copied to clipboard
Default URL redirects are cached too aggressively
Details
- Read the Docs project URL: https://readthedocs.org/projects/pelorus/
Browser: chrome Version 103.0.5060.134 (Official Build) (arm64) OS: macOS 12.4
Steps
- Set up automation to make latest release the default url:
- Match: SemVer versions
- Version type: tag
- Action: set version as default
- Set up a github incoming webhook
- Tag a release
- View the docs at
/ - Tag another release within 24 hours
- Verify that the default version in
Advanced Settings / Global Settings / Default versionpoints to the release from the previous step - View the docs at
/
Expected Result
I am redirected to /en/$tag_from_step_5
Actual Result
Chrome loads the 302 FOUND from cache, redirecting me to /en/$tag_from_step_3.
Analysis
Here's the response headers, I assume from the initial request at step 4:
Response Headers (some omitted, if I don't know how private they are)
cache-control: public, max-age=86400
cf-cache-status: MISS
content-language: en
content-length: 0
content-type: text/html; charset=utf-8
date: Tue, 09 Aug 2022 18:33:36 GMT
expires: Wed, 10 Aug 2022 18:33:36 GMT
location: https://pelorus.readthedocs.io/en/v1.6.0/
referrer-policy: no-referrer-when-downgrade
server: cloudflare
vary: Accept-Language, Cookie, Accept-Encoding
x-content-type-options: nosniff
x-rtd-domain: pelorus.readthedocs.io
x-rtd-project: pelorus
x-rtd-project-method: subdomain
x-rtd-redirect: system
x-rtd-version-method: path
x-served: Django-Proxito
Chrome caches redirects, but will honor cache-control headers on the redirect response. If I understand how this works, max-age=86400 means the redirect will be cached for 24 hours.
Perhaps conditional cache headers should be used for the top-level URL, instead of max-age in cache-control. Last-Modified or ETag (generated based on the default version) should work, if cloudflare can handle those. Note: The cache for the Location (the actual version, /en/$version) can stay the same! This isn't about that.
I do not believe this is related to #9167. The tagged version is fresh and served correctly. According to chrome, it doesn't even make the request because of the cache-control header. That issue says "a few minutes"-- the redirect is still pointing to the older version, despite the automation changing the settings 1 hour and 16 minutes ago at time of writing.
I'm not sure how this affects other browsers. Perhaps they're less aggressive at caching redirects.
If needed, I can try to make a smaller (perhaps even automated) reproduction of the bug.
I think we should reduce the max-age value, maybe 10 or 5 minutes is enough, cc @ericholscher
Perhaps conditional cache headers should be used for the top-level URL, instead of max-age in cache-control. Last-Modified or ETag (generated based on the default version) should work
I think you still need to set a max-age value, from my understanding ETag and Last-Modified are just the methods the server uses to response if the response should still be cached or not. CF should already handle this for us.
Based on my skimming (not a deep read, I admit) of MDN and RFC9111, I don't think you need max-age. Unless that's a cloudflare-specific requirement.
@stsewd
I think we should reduce the max-age value, maybe 10 or 5 minutes is enough, cc @ericholscher
Are we setting this value at the application or is it done at Cloudflare? Can we reduce the max-cache only on redirect responses but keep it as is for regular pages?
@humitos that's done at Cloudflare, but we have had the same problem with normal pages, we usually ask users to do a hard refresh.
I'm not too worried about normal pages because it's easily solved with a hard refresh which is more likely for users to know about. However, telling the browser to forget a redirect is not common knowledge and may be hard to figure it out how to do it.
I'd think it makes sense to reduce the redirects cache time to 10 minutes or similar to avoid this problem. However, I wouldn't reduce too much the for regular resources because readers will have to re-download the full page each time they switch pages, tho.
I don't know if we have the ability to specify redirect-only headers, but we might be able to. We should see if there's a Page Rule or similar in the CDN that can handle this.
Looks like CF already does a version of this? https://developers.cloudflare.com/cache/how-to/configure-cache-status-code/#edge-ttl
Looks like CF already does a version of this? https://developers.cloudflare.com/cache/how-to/configure-cache-status-code/#edge-ttl
@ericholscher that's the edge caching, here the problem is with the browser caching level, which we have set to 1 day

Also, just checked the page rules and transform rules, and I don't see something that let us filter by status code https://developers.cloudflare.com/rules/transform/response-header-modification/reference/fields-functions/, there is only that option to set the cache at the edge level, but maybe that rule also overrides the age for the browser cache level?
The Browser Cache TTL sets the expiration for resources cached in a visitor’s browser. By default, Cloudflare honors the cache expiration set in your Expires and Cache-Control headers but overrides those headers if:
- The value of the Cache-Control header from the origin web server is less than the Browser Cache TTL Cloudflare setting.
From: https://developers.cloudflare.com/cache/about/edge-browser-cache-ttl/
So I think we can lower the default, and set it in our application.
I've just updated this to be 5 minutes, which should hopefully work around this issue.
Just ran across this issue, and thanks for dropping the TTL. I got bit by this because we serve JSON resources on our site and Deno respects the Cache-Control max-age when loading Javascript or JSON documents from the internet. Days or even months is fine for our versioned docs, as we will almost never rebuild, but /en/latest/ is useful to have on the order of 5-30 minutes.
Just sharing in case this use-case is not one you'd considered. It might even be nice to be separately configurable in the readthedocs.yml for tags and latest/stable.
Refs #10035
We saw an increase in our load, so we set the browser cache TTL to 20min, but we are discussing increasing the edge cache instead.
I think we can probably call this issue done, and focus on any additional TTL configuration in #10035. The original problem is solved, and we have a path forward on being a lot smarter about how we handle different files and versions, which would solve this bit of feedback:
Just ran across this issue, and thanks for dropping the TTL. I got bit by this because we serve JSON resources on our site and Deno respects the Cache-Control max-age when loading Javascript or JSON documents from the internet. Days or even months is fine for our versioned docs, as we will almost never rebuild, but /en/latest/ is useful to have on the order of 5-30 minutes.
I will make sure we consider that in #10035.