datasette
datasette copied to clipboard
Sensible `cache-control` headers for static assets, including those served by plugins
What I'm seeing
With default_cache_ttl = 86400
, I see the following:
A table view returns Cache-control: max-age=86400
:
A static asset returns no Cache-control
header:
What I expected to see
I expected the static asset to return a Cache-control
header indicating that this response can be cached.
Why this matters
I'm productionising a Datasette deployment right now and was looking into putting it behind a Varnish instance. I was surprised to see requests for static assets being served from Datasette rather than Varnish, this is what led me to look more closely at the response headers.
While Datasette serves those static assets pretty quickly, I don't see why Datasette should serve them. By their nature, static assets like images and JS files are very cacheable, so it should be easy to serve them from a cache like Varnish.
(Note that Varnish can easily be configured to override this header, enabling caching for static assets. But it would be better if this override was not necessary.)
Discussion
It seems clear to me that serving static assets without a Cache-control
header is not ideal.
I see two options here:
A. Static assets use the same logic as table / SQL views to set the Cache-control
header based on default_cache_ttl
.
B. An additional setting for static assets is introduced (default_static_cache_ttl
, say).
I agree: this is bad.
Ideally, content served from /static/
would apply best practices for static content serving - which to my mind means the following:
- Where possible, serve with a far-future cache expiry header and use an asset URL that changes when the file itself changes
- For assets without that, support conditional GET to avoid transferring the whole asset if it hasn't changed
- Some kind of sensible mechanism for setting cache TTLs on assets that don't have a unique-file-per-version - in particular assets that might be served from plugins.
Datasette half-implemented the first of these: if you view source on https://latest.datasette.io/ you'll see it links to /-/static/app.css?cead5a
- which in the template looks like this:
https://github.com/simonw/datasette/blob/dd94157f8958bdfe9f45575add934ccf1aba6d63/datasette/templates/base.html#L5
I had forgotten I had implemented this! Here is how it is calculated:
https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/app.py#L510-L516
So app.css
right now could be safely served with a far-future cache header... only it isn't:
~ % curl -i 'https://latest.datasette.io/-/static/app.css?cead5a'
HTTP/2 200
content-type: text/css
x-databases: _memory, _internal, fixtures, extra_database
x-cloud-trace-context: 9ddc825620eb53d30fc127d1c750f342
date: Sat, 05 Mar 2022 01:01:53 GMT
server: Google Frontend
content-length: 16178
The larger question though is what to do about other assets. I'm particularly interested in plugin assets, since visualization plugins like datasette-vega
and datasette-cluster-map
ship with large amounts of JavaScript and I'd really like that to be sensibly cached by default.
The existing app_css_hash
already isn't good enough, because I built that before table.js
existed, and that file should obviously be smartly cached too.
It sounds like you can workaround this with Varnish configuration for the moment, but I'm going to bump this up the list of things to fix - it's particularly relevant now as I'd like to get a solution in place before Datasette 1.0, since it's likely to be beneficial to plugins and hence should be part of the stable, documented plugin interface.
Hah, found a TODO about this: https://github.com/simonw/datasette/blob/c5791156d92615f25696ba93dae5bb2dcc192c98/datasette/app.py#L997-L999