Hash-based etag layer
I'm thinking about writing a hash-based etag layer.
My plan is to:
- When a response is going to go out, hash the body with blake3 or xxhash
- Add a strong etag header with the digest
- If the request contains an
If-Matchwith a matching etag value and the response code is 200, set the code to 304 and remove the body
I probably can't contribute what I wrote to tower-http, since I'm very busy with the project that this is for, but if I wind up writing it I can definitely post or link to the code, in case a tower-http contributor is interested in getting it into tower-http. Also, even if it doesn't get into tower-http, it might give some insight into the viability of the approach.
Some reasons why this might not be a good idea:
- Calculating response body digests adds too much CPU load for not enough benefit
- Replacing the body of responses interacts badly with content encoding, or other layers
Any thoughts on the viability of this approach are welcome, especially if it's been considered for tower-http. I've been looking around for other applications that automatically add etags using a hash function, and I haven't found many. This makes me wonder if the idea is actually wrothwhile.
Yeah we could have something like that. The issue is that it requires buffering the whole response body in the middleware which doesn't work for streaming responses but for responses made using Full it should be fine.
Our axum web server is now behind cloudflare, so this is basically at the bottom of my priorities list. I started working on it, but I want to make sure I get the most out of cloudflare before this, since there are downsides, like the CPU needed to hash and the buffering.
By the way, huge thank you to everyone working on tokio, tower, and axum, and the massive long tail of dependencies which enabled us to ship something great.
We are up to 13 million requests per day, around 150 qps, and we absolutely could not have done this without tower. Everything we do is open source and CC0, so everywhere it makes sense we want to upstream.
Thank you! ❤️
We are up to 13 million requests per day, around 150 qps, and we absolutely could not have done this without tower. Everything we do is open source and CC0, so everywhere it makes sense we want to upstream.
I'm curious, who is 'we'?
We are me and the small cadre of lunatics behind casey/ord, which is the block explorer and wallet for ordinals, a Bitcoin NFT protocol that has been getting popular recently. The main feed of new NFTs is here. Things are amazingly rough, but the response has been incredible.
Oh 😐
I think it could make sense to implement this for ServeDir, where the ETag could be computed without reading everything into memory by reading the file twice, cached or even precomputed like can be done for compression.
Rather than hashing whole file, I would suggest to consider computing etag from last modified timestamp, length and optionally hash of file name
Hashing the filename is useless, since renaming the file will result in it having a different URL and cache key anyway. Hashing the length doesn't help much either, as there are types of files whose length never changes, such as the index.html built by build tools that just insert a hash of scripts into it. Hashing the last modified time has the opposite issue: It can easily change without the file itself changing.
---------- Původní e-mail ---------- Od: Douman @.> Komu: tower-rs/tower-http @.> Kopie: SvizelPritula @.>, Comment @. github.com> Datum: 3. 7. 2023 7:24:13 Předmět: Re: [tower-rs/tower-http] Hash-based etag layer (Issue #324) "
Rather than hashing whole file, I would suggest to consider computing etag from last modified timestamp, length and optionally hash of file name
— Reply to this email directly, view it on GitHub (https://github.com/tower-rs/tower-http/issues/324#issuecomment-1617373309), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AHYWLAODUGSC3UE62HPL3QDXOJJPRANCNFSM6AAAAAAUGZWSVA) . You are receiving this because you commented. Message ID: <tower-rs/tower- @.***> "
Related: https://github.com/tower-rs/tower/issues/692. This seems like a pretty often requested feature
I've started on an implementation at https://github.com/billythedummy/tower-etag-cache/tree/master/tower-etag-cache. The implemented ConstLruProvider probably won't work well for streaming responses due to https://github.com/tower-rs/tower-http/issues/324#issuecomment-1406199880 but the generic traits exported by the crate (hopefully) allows for better implementations. It currently seems to work ok with the examples/simple example axum server in the repo
@billythedummy I am also interested in this feature, so I had a look at your implementation and example. Thank you for your work.
I tested the caching behaviour with the index.css file of your example app:
- Request the file index.css => 200 OK (Etag is returned)
- Request the file again (including if-none-match with matching Etag) => 304 Not Modified
- Modify the css file on the server
- Request the file again (including if-none-match with same Etag as in step 2) => 304 Not Modified
But in step 4, the server should return the new file (as it has been modified). Doesn't this defeat the purpose of the Etag middleware?
@Weltpilot thank you for the review. I wanted to just do a basic proof-of-concept with the ConstLruProvider so it has a very simple model that assumes all resources are static and don't change throughout the lifetime of the server. You can build your own cache provider with different behavior - perhaps it invalidates certain entries when files in a specific directory change or perhaps it always reruns the request and recalculates the etag
If you assume that all resources are static and don't change, why not simply always return "304 Not Modified", when the request contains a "If-None-Match" or "If-Modified-Since" header?
@Weltpilot does not change throughout the lifetime of the server. What this enables:
- user GETs index.html, server responds with document + etag
abc - i modify the contents of index.html and push the update, restarting my server. The etag of index.html is now
xyzdue to the different content - user GETs index.html again with if-none-match
abc. The new server does not see aindex.html: abcentry in its cache and responds with the new index.html + etagxyz
@billythedummy So, when you want to change a static file, you need to restart the server? Sorry, that doesn't make any sense to me. @jplatte, @davidpdrsn Maybe one of the maintainers can make a proposal how to implement an ETag middleware that has a chance of being accepted as pull request? I would opt for the suggestion by @SvizelPritula to extend ServeDir by an option to read the ETag from the file system, that has been generated in advance (similar to the compressed files).
I think the best path is to develop it as a standalone crate, unless it specifically needs private API for tower-http, that we can’t just make public.
tower-http doesn’t get much maintenance these days (personally I’m busy with work and axum) so don’t wanna add more stuff to tower-http which further burdens maintainers.