caddy
caddy copied to clipboard
Existing HTTP/3 connections become unusable during config update
When a new config is pushed to caddy using POST http://localhost:2019, clients with existing HTTP/3 connections to the caddy server lose the ability to connect for a short period of time, causing active users to experience downtime.
I was able to reproduce this with a clean install on a new EC2 instance:
- Create a t4g.nano instance on EC2 with Ubuntu 24.04 (arm64).
- Open TCP ports 22, 80, 443 and UDP port 443.
- Install Caddy v2.9.1 for Linux arm64
- Create a
config.jsonfile on the server (replaceTESTDOMAINwith a domain you control):
{
"apps": {
"http": {
"servers": {
"srv0": {
"listen": [ ":443" ],
"routes": [
{
"match": [ { "host": [ "TESTDOMAIN" ] } ],
"handle": [
{
"handler": "subroute",
"routes": [
{
"group": "group1",
"handle": [
{
"body": "hello world",
"handler": "static_response"
}
]
}
]
}
],
"terminal": true
}
]
}
}
},
"tls": {
"automation": {
"policies": [
{
"subjects": [ "TESTDOMAIN" ]
}
]
}
}
}
}
- Add an
Arecord forTESTDOMAINthat matches the public IPv4 address of the EC2 instance. - Run
curl -d @config.json -H 'content-type: application/json' http://localhost:2019/loadon the server - Go to
https://TESTDOMAINin Google Chrome (tested with both Chrome for Android and desktop). - "hello world" should be shown.
- Refresh the page a bunch. It should load just fine.
- Edit
config.json, changing "hello world" to "hello world 2". - Run
sleep 3 && curl -d @config.json -H 'content-type: application/json' http://localhost:2019/loadon the server - Quickly switch to Google Chrome and start refreshing the page rapidly. The refreshes will work until
sleep 3finishes, at which point the page will hang and refuse to load. - Navigate to
https://TESTDOMAINfrom a different browser and observe that the page does in fact load. - Refresh Google Chrome again, and observe that the page still does not load.
- Wait 1-2 minutes.
- Refresh Google Chrome again. The page will now load correctly.
Blocking UDP port 443 seems to fix this problem.
Log output during config change:
{"level":"info","ts":1742505571.2683778,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"40602","headers":{"Accept":["*/*"],"Content-Length":["885"],"Content-Type":["application/json"],"User-Agent":["curl/8.5.0"]}}
{"level":"info","ts":1742505571.2690651,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//[::1]:2019","//127.0.0.1:2019","//localhost:2019"]}
{"level":"info","ts":1742505571.269525,"logger":"http.auto_https","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
{"level":"info","ts":1742505571.269545,"logger":"http.auto_https","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"info","ts":1742505571.2697465,"logger":"http","msg":"enabling HTTP/3 listener","addr":":443"}
{"level":"warn","ts":1742505571.2697577,"msg":"quic listener tls configs are more than 2","number of configs":3}
{"level":"info","ts":1742505571.2697651,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"warn","ts":1742505571.269796,"logger":"http","msg":"HTTP/2 skipped because it requires TLS","network":"tcp","addr":":80"}
{"level":"warn","ts":1742505571.2698002,"logger":"http","msg":"HTTP/3 skipped because it requires TLS","network":"tcp","addr":":80"}
{"level":"info","ts":1742505571.269803,"logger":"http.log","msg":"server running","name":"remaining_auto_https_redirects","protocols":["h1","h2","h3"]}
{"level":"info","ts":1742505571.2698061,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["caddybugtest.timmclean.net"]}
{"level":"info","ts":1742505571.2698162,"logger":"http","msg":"servers shutting down with eternal grace period"}
{"level":"info","ts":1742505571.2710686,"msg":"autosaved config (load with --resume flag)","file":"/var/lib/caddy/.config/caddy/autosave.json"}
{"level":"info","ts":1742505571.2711222,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1742505571.2733417,"logger":"admin","msg":"stopped previous server","address":"localhost:2019"}
Thanks for the instructions; I will give it a try soon... in the meantime, does this only happen with Chrome?
(What about curl with http/3, or Firefox?)
I just tried Firefox and the behaviour there is a bit better. When the config update goes through, it seems to hesitate for a moment and then fallback to HTTP/2 transparently. When I check the requests in the Network tab, I can see that they are HTTP/3 before the config change, and then HTTP/2 after the config change. I'm guessing the hesitation is because Firefox is setting up a new connection.
I haven't observed any issues with curl --http3-only so far in my testing. It recreates connections every time, so that makes sense to me.
Client versions (desktop):
- Chrome 134.0.6998.117
- Firefox 136.0.2
- curl 8.12.1 musl
Chrome stable http3 support is spotty.. I have achieved better results with chrome-dev where it seems to be more consistent/stable.
We're seeing this issue as well with Chrome and also Arc (which is Chromium-based IIRC).
We'll just turn off http 3 in caddy for now -- is that the recommended workaround?
That should work around it by avoiding the glitchy code paths, yeah. I haven't had a chance yet to dig into this, but it does seem odd that it's mainly Chrome-only.
That should work around it by avoiding the glitchy code paths, yeah. I haven't had a chance yet to dig into this, but it does seem odd that it's mainly Chrome-only.
We also found that changing "experimental quic protocol" from "default" to "disabled" in Chrome fixes the problem btw. (It surprised me that it was enabled by default in the release version of Chrome, despite being experimental.)
I tried upgrading from Caddy v2.8.4 to v2.9.0, v2.9.1, and v2.10.0, and the problem with HTTP/3 connections becoming unusable/hanging (pending in browser) after a config update persists across all versions. This problem is only observed in Chrome
Disabling the QUIC protocol or removing HTTP/3 (h3) from the Caddy configuration resolves the issue.
Is this being actively investigated? Are there any updates on potential fixes or patches?
I'm a bit swamped and haven't noticed this issue for myself yet. Can anyone (esp. anyone experiencing the issue) help look into it?
I had disabled HTTP/3 due to the same issue, but it seems to have been fixed in version 2.10.1.
- caddyhttp: Free up quic listener when stopping (https://github.com/caddyserver/caddy/pull/7177)
After updating, I verified that it works perfectly.
I was hoping that would be the case. 😃 We have @WeidiDeng to thank for that!
Confirmed fixed for me in v2.10.1 with Chrome 🥳
Interestingly, I still see the downgrade-to-HTTP/2 behaviour in Firefox whenever there is a server config change. The fallback seems to happen transparently and without delay now though, so I would say this is not an issue as it is not visible to users.
@timmclean You need to contact firefox team as why h2 fallback is used in this case. This is browser specific behavior that we can not control. Browsers have their own preferences as whether to use h3 instead of h2/1.