Plume
Plume copied to clipboard
add caching headers for posts
until we've switched to using the async version of Rocket, we'll have serious performance issues.
Most installations of Plume are already behind a (caching) proxy server. we can use this to fix the current performance issues by sending caching headers for posts pages:
How?
the posts page should send an ETag with each response.
The advantage here is that we can have updates, and comments, since this means that we send new ETag header.
to spare a lookup in the database, we could store data about whether the post's cache is invalidated within rocket (that would be a 304 response to a HEAD request)
n.b.: The (caching) proxy server admin could adjust the time themselves, we wouldn't be sending any ourselves.
Alternatives
are currently a lot more effort:
- #730
- #777
and there's also: https://github.com/SergioBenitez/Rocket/pull/1343 which is similar to #777, but on rocket level
those aren't really alternatives, since this suggestion is operating at HTTP level, it's more orthogonal.
We already have some kind of caching for static assets : https://github.com/Plume-org/Plume/blob/master/src/routes/mod.rs#L207-L270
But adding it to templates and/or federation pages could improve performances too.
does rocket derive #[head()] for these routes itself?
if we need a DB lookup to construct a body which might still be in a downstream cache, then we should define HEAD ourselves
Maybe we can have a Vec<String> storing the slugs of the posts that need an update in Rocket's state, and only show an updated version when needed.
Another solution would be to render each post to a given HTML or AP file when the post or comments change, and serve this files statically, using the modification date to generate the cache headers.
There is already caching for any templated page, with the following limitations :
- reply that page is cached only for GET request (should probably be HEAD too actually)
- page must not contains form (due to csrf tokens, but actually this should be removed due to next point)
- it actually does db lookup and generate the page, then hash it, so it saves only bandwidth, not cpu/io
This should be removed when more intelligent caching get implemented
Implementation is in src/template_utils.rs
can you propose a patch out of this situation?
and do you believe it would help us with our current performance issues, before we switch to async rocket?
I'll look into it if I got some time.
Honestly I have no idea, I'm not operating any instances so I don't have neither logs nor a flamegraph of an instance under non-synthetic load, both of which would help with a diagnosis. If someone have a day or a week worth of logs from fediverse.blog that would be interesting
I just noticed the Subscribe button on any post is actually a form, so it is never cached by clients :neutral_face: ~~I also noticed that for some reason, caching, for static or templated resources, seems broken, so it might be a good first step to find why and how much it helps~~ So it's broken on most public servers I found, but not when running a local instance, without reverse proxy or with Caddy. It might be related to this discussion
https://mailman.nginx.org/pipermail/nginx/2013-September/040425.html
The ETag header is removed if nginx changes a response returned.
That is, if you have gzip, gunzip, sub, addition, ssi or xslt filters applied to responses returned.
so… instead of adding it's own ETag for such filters, nginx just drops them?
n.b.: Apache HTTPd does not do that.
Subscribe button on any post is actually a form,
this is bad, probably. let's open a bug for that.
Apparently it does not really remove etags, but change them from strong to weak (see rfc for the exact meaning), which means it matching for "tag" or W/"tag" should maybe work. See patch if someone want to test
patch
From 9916d1692ed8ef12fd1d3ee941ed5796345a2b97 Mon Sep 17 00:00:00 2001
From: Trinity Pointard <[email protected]>
Date: Sun, 21 Jun 2020 21:17:33 +0200
Subject: [PATCH] try fixing cache with nginx
---
src/template_utils.rs | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/template_utils.rs b/src/template_utils.rs
index 26bf58b..5af9f76 100644
--- a/src/template_utils.rs
+++ b/src/template_utils.rs
@@ -63,7 +63,10 @@ impl<'r> Responder<'r> for Ructe {
let etag = format!("{:x}", hasher.finish());
if r.headers()
.get("If-None-Match")
- .any(|s| s[1..s.len() - 1] == etag)
+ .any(|s| {
+ s[1..s.len() - 1] == etag ||
+ s[3..s.len() - 1] == etag
+ })
{
Response::build()
.status(Status::NotModified)
--
2.27.0
I only have the last six hours of the access logs of fediverse.blog, but it case it helps, I can send them to you (and I can configure nginx to keep them longer).
Also, do you want me to deploy the above patch on fediverse.blog to see if it helps?
if it's only 6 hours, could you take them around 8-10pm cest, so that there is afternoon and evening? That's when I guess there is the most load.
You can deploy it, it won't break anything for sure, but I don't know if it'll help, it should only reduce bandwidth usage
I'll give you the logs this evening then. I'll deploy the patch after that, to see if there is a difference. Thank you for your help!