static-web-server icon indicating copy to clipboard operation
static-web-server copied to clipboard

ETag / 304 unmodified support?

Open onionhammer opened this issue 3 years ago • 10 comments

It seems like basically everything right now every html page seems to come back with cache control max age of 86400, if you deploy a new version of your app, most bundlers will hash asset names to cachebust other than your HTML files.

Describe the solution you'd like Support etags, so people can configure HTML files and other non-hashed files to send not modified responses.

Describe alternatives you've considered Using a different static web server

onionhammer avatar Feb 15 '23 14:02 onionhammer

Yes, definitely a nice to have feature.

joseluisq avatar Feb 15 '23 14:02 joseluisq

I'm not a rustacean, but I may give it a go for fun next weekend; this source code looks pretty clean & self explanatory; if you have any thoughts or advice on direction/libs to utilize for high-perf file hashing (to generate an etag) or an algorithm, maybe post it here.

onionhammer avatar Feb 15 '23 14:02 onionhammer

I'm not a rustacean, but I may give it a go for fun next weekend; this source code looks pretty clean & self explanatory.

Sure, the code in general is easy to jump-in, so don't hesitate to give it a try.

if you have any thoughts or advice on direction/libs to utilize for high-perf file hashing (to generate an etag) or an algorithm, maybe post it here.

I found interesting this Actix file content, so you could have it a look at https://github.com/actix/actix-web/blob/master/actix-files/src/named.rs#L378

Just let me know here if you need any other assistance on your SWS journey.

joseluisq avatar Feb 15 '23 15:02 joseluisq

I looked into it, I think it would honestly be very little effort to put in defaults just for the common use case of "bundled app with static assets that have cache-busting names", but actually putting in a real etag (not weak etag - with the W/ prefix) could be a lot of work.

The thing that lead me to rethink my decision to plop in a quick-and-easy new setting, was I think adding etag on top of what you have today would just add technical debt on top of the hardcoded cache preferences that are already in the program. Probably prior to etag support comes, the program will need more robust cache configuration options

onionhammer avatar Feb 18 '23 01:02 onionhammer

I looked into it, I think it would honestly be very little effort to put in defaults just for the common use case of "bundled app with static assets that have cache-busting names", but actually putting in a real etag (not weak etag - with the W/ prefix) could be a lot of work.

What exactly could be a lot of work in your opinion? Do you mean an algo to generate/validate strong etags efficiently? I think that the entity_tag crate could help here.

The thing that lead me to rethink my decision to plop in a quick-and-easy new setting, was I think adding etag on top of what you have today would just add technical debt on top of the hardcoded cache preferences that are already in the program.

The current cache-control preferences are already part of the general section settings offered by SWS as one of several defaults. But, cache-control is behind a simple bool flag. So, it should not be a big deal against a new etag flag. Also, headers like cache-control can be overwritten by users on demand.

Probably prior to etag support comes, the program will need more robust cache configuration options

I think that the etag feature could have its own advanced section place. So, users can have robust cache configuration options and the freedom to tweak them as needed. Also, SWS advanced options are only available via the config.toml file.

If you have ideas about how to improve the cache mechanism of SWS, please feel free to share them.

joseluisq avatar Feb 19 '23 22:02 joseluisq

Just to let know here that I started to work on a module to compute/check strong Etags via the ahash crate under an advanced TOML entry. I eventually will integrate that module on SWS in a couple of releases or so, stay tuned.

joseluisq avatar Mar 04 '23 13:03 joseluisq

Turned out that it got more tricky than I expected. You can find more context and progress on this thread https://users.rust-lang.org/t/how-to-compute-a-byte-for-byte-etag-using-tokio/92481

But besides, until exploring Trailer. We could probably re-think the idea of going with a weak ETag instead as several implementations do. E.g https://github.com/actix/actix-web/blob/master/actix-files/src/named.rs#L377-L407

So I will leave it like this for now until I find time again to probably come back to the topic. In the meantime, If someone would like to help, please feel free to go ahead.

joseluisq avatar Apr 13 '23 22:04 joseluisq

It is probably possible to generate a weak ETag without having to read the whole body of a file.

It can be done, for example, this way (just adding a patch on top of 8c6ab533fd23fb766c13ceddd2b0640b776ad4db as it's not worth a PR):

diff --git a/src/static_files.rs b/src/static_files.rs
index 6165696..df726e3 100644
--- a/src/static_files.rs
+++ b/src/static_files.rs
@@ -597,9 +597,10 @@ async fn response_body(
     conditionals: Conditionals,
 ) -> Result<Response<Body>, StatusCode> {
     let mut len = meta.len();
-    let modified = meta.modified().ok().map(LastModified::from);
+    let modified = meta.modified().ok();
+    let last_modified_header = modified.map(LastModified::from);
 
-    match conditionals.check(modified) {
+    match conditionals.check(last_modified_header) {
         Cond::NoBody(resp) => Ok(resp),
         Cond::WithBody(range) => {
             let buf_size = optimal_buf_size(meta);
@@ -645,9 +646,12 @@ async fn response_body(
                     resp.headers_mut().typed_insert(ContentType::from(mime));
                     resp.headers_mut().typed_insert(AcceptRanges::bytes());
 
-                    if let Some(last_modified) = modified {
+                    if let Some(last_modified) = last_modified_header {
                         resp.headers_mut().typed_insert(last_modified);
                     }
+                    if let Some(last_modified) = modified {
+                        insert_etag(&mut resp, last_modified);
+                    }
 
                     Ok(resp)
                 })
@@ -663,6 +667,19 @@ async fn response_body(
     }
 }
 
+/// Inserts the ETag header based on the provided modified time. The value adeed is a weak ETag
+/// that also contains a fixed prefix `LMT` (Last Modified Time) to allow for future changes of the
+/// format of the ETag.
+///
+/// An example value might look like: `W/"LMT1705668081589469821"`.
+fn insert_etag(resp: &mut Response<Body>, modified: SystemTime) {
+    if let Ok(elapsed) = modified.duration_since(std::time::UNIX_EPOCH) {
+        let etag = format!(r#"W/"LMT{}""#, elapsed.as_nanos());
+        resp.headers_mut()
+            .typed_insert(headers::ETag::from_str(&etag).unwrap());
+    }
+}
+
 struct BadRange;
 
 fn bytes_range(range: Option<Range>, max_len: u64) -> Result<(u64, u64), BadRange> {
diff --git a/tests/compression_static.rs b/tests/compression_static.rs
index f614c6c..e8b43a0 100644
--- a/tests/compression_static.rs
+++ b/tests/compression_static.rs
@@ -66,6 +66,14 @@ async fn compression_static_file_exists() {
         assert_eq!(headers["content-encoding"], "gzip");
         assert_eq!(headers["accept-ranges"], "bytes");
         assert!(!headers["last-modified"].is_empty());
+        {
+            let etag = headers["etag"].to_str().unwrap_or("");
+            assert!(
+                etag.starts_with(r#"W/MMS""#),
+                "should contain weak etag in {}",
+                etag
+            );
+        }
         assert_eq!(
             &headers["content-type"], "text/html",
             "content-type is not html"

We would still need to extend the Conditionals::check to support the If-None-Match and If-Match headers (on top of the "If-(Un)Modified", "Range") and respond appropriately with a 200 or 304 correctly when a browser requests the resource to be re-validated.

But the biggest problem with the ETag is that the re-validation will never be requested for at least the Cache-Control: max-age duration.

So if the objective is to be able to get a fresh resource when it changes, then just making use of an ETag will not solve this. We would need to probably remove Cache-Control or even set Cache-Control to no-store or no-cache or start doing other combinations of caching mechanisms.

I'd suggest that this complexity might be hard to justify. At the end of the day, as soon as there is caching, there is always going to be stale resource.

There are only two hard things in Computer Science: cache invalidation and naming things. 😄

dnagir avatar Jan 19 '24 15:01 dnagir

@dnagir as I said before. We can adopt a weak Etag using a combination of modified time and file size similar to your patch.

However, I should be also in favor of adjusting the Conditionals::check to evaluate at least If-None-Match, If-Match and If-Unmodified-Since headers. Maybe something like https://github.com/weihanglo/sfz/blob/master/src/http/conditional_requests.rs#L25

PR welcome.

joseluisq avatar Jan 19 '24 21:01 joseluisq

Note that a weak Etag has performance advantages which is likely why nginx uses it ({last_mod_time}-{file_size} with both parameters in hex). In particular, it’s a prerequisite if one is to use sendfile functionality to send file contents in the most efficient way possible, something that SWS might want to adapt in future.

palant avatar Apr 19 '24 11:04 palant