caddy icon indicating copy to clipboard operation
caddy copied to clipboard

Existing HTTP/3 connections become unusable during config update

Open timmclean opened this issue 8 months ago • 6 comments
trafficstars

When a new config is pushed to caddy using POST http://localhost:2019, clients with existing HTTP/3 connections to the caddy server lose the ability to connect for a short period of time, causing active users to experience downtime.

I was able to reproduce this with a clean install on a new EC2 instance:

  1. Create a t4g.nano instance on EC2 with Ubuntu 24.04 (arm64).
  2. Open TCP ports 22, 80, 443 and UDP port 443.
  3. Install Caddy v2.9.1 for Linux arm64
  4. Create a config.json file on the server (replace TESTDOMAIN with a domain you control):
{
  "apps": {
    "http": {
      "servers": {
        "srv0": {
          "listen": [ ":443" ],
          "routes": [
            {
              "match": [ { "host": [ "TESTDOMAIN" ] } ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "group": "group1",
                      "handle": [
                        {
                          "body": "hello world",
                          "handler": "static_response"
                        }
                      ]
                    }
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "subjects": [ "TESTDOMAIN" ]
          }
        ]
      }
    }
  }
}
  1. Add an A record for TESTDOMAIN that matches the public IPv4 address of the EC2 instance.
  2. Run curl -d @config.json -H 'content-type: application/json' http://localhost:2019/load on the server
  3. Go to https://TESTDOMAIN in Google Chrome (tested with both Chrome for Android and desktop).
  4. "hello world" should be shown.
  5. Refresh the page a bunch. It should load just fine.
  6. Edit config.json, changing "hello world" to "hello world 2".
  7. Run sleep 3 && curl -d @config.json -H 'content-type: application/json' http://localhost:2019/load on the server
  8. Quickly switch to Google Chrome and start refreshing the page rapidly. The refreshes will work until sleep 3 finishes, at which point the page will hang and refuse to load.
  9. Navigate to https://TESTDOMAIN from a different browser and observe that the page does in fact load.
  10. Refresh Google Chrome again, and observe that the page still does not load.
  11. Wait 1-2 minutes.
  12. Refresh Google Chrome again. The page will now load correctly.

Blocking UDP port 443 seems to fix this problem.

Log output during config change:

{"level":"info","ts":1742505571.2683778,"logger":"admin.api","msg":"received request","method":"POST","host":"localhost:2019","uri":"/load","remote_ip":"127.0.0.1","remote_port":"40602","headers":{"Accept":["*/*"],"Content-Length":["885"],"Content-Type":["application/json"],"User-Agent":["curl/8.5.0"]}}
{"level":"info","ts":1742505571.2690651,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//[::1]:2019","//127.0.0.1:2019","//localhost:2019"]}
{"level":"info","ts":1742505571.269525,"logger":"http.auto_https","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
{"level":"info","ts":1742505571.269545,"logger":"http.auto_https","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
{"level":"info","ts":1742505571.2697465,"logger":"http","msg":"enabling HTTP/3 listener","addr":":443"}
{"level":"warn","ts":1742505571.2697577,"msg":"quic listener tls configs are more than 2","number of configs":3}
{"level":"info","ts":1742505571.2697651,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
{"level":"warn","ts":1742505571.269796,"logger":"http","msg":"HTTP/2 skipped because it requires TLS","network":"tcp","addr":":80"}
{"level":"warn","ts":1742505571.2698002,"logger":"http","msg":"HTTP/3 skipped because it requires TLS","network":"tcp","addr":":80"}
{"level":"info","ts":1742505571.269803,"logger":"http.log","msg":"server running","name":"remaining_auto_https_redirects","protocols":["h1","h2","h3"]}
{"level":"info","ts":1742505571.2698061,"logger":"http","msg":"enabling automatic TLS certificate management","domains":["caddybugtest.timmclean.net"]}
{"level":"info","ts":1742505571.2698162,"logger":"http","msg":"servers shutting down with eternal grace period"}
{"level":"info","ts":1742505571.2710686,"msg":"autosaved config (load with --resume flag)","file":"/var/lib/caddy/.config/caddy/autosave.json"}
{"level":"info","ts":1742505571.2711222,"logger":"admin.api","msg":"load complete"}
{"level":"info","ts":1742505571.2733417,"logger":"admin","msg":"stopped previous server","address":"localhost:2019"}

timmclean avatar Mar 20 '25 21:03 timmclean

Thanks for the instructions; I will give it a try soon... in the meantime, does this only happen with Chrome?

(What about curl with http/3, or Firefox?)

mholt avatar Mar 20 '25 21:03 mholt

I just tried Firefox and the behaviour there is a bit better. When the config update goes through, it seems to hesitate for a moment and then fallback to HTTP/2 transparently. When I check the requests in the Network tab, I can see that they are HTTP/3 before the config change, and then HTTP/2 after the config change. I'm guessing the hesitation is because Firefox is setting up a new connection.

I haven't observed any issues with curl --http3-only so far in my testing. It recreates connections every time, so that makes sense to me.

Client versions (desktop):

timmclean avatar Mar 20 '25 22:03 timmclean

Chrome stable http3 support is spotty.. I have achieved better results with chrome-dev where it seems to be more consistent/stable.

crrodriguez avatar Mar 23 '25 13:03 crrodriguez

We're seeing this issue as well with Chrome and also Arc (which is Chromium-based IIRC).

We'll just turn off http 3 in caddy for now -- is that the recommended workaround?

jph00 avatar Apr 18 '25 22:04 jph00

That should work around it by avoiding the glitchy code paths, yeah. I haven't had a chance yet to dig into this, but it does seem odd that it's mainly Chrome-only.

mholt avatar Apr 19 '25 03:04 mholt

That should work around it by avoiding the glitchy code paths, yeah. I haven't had a chance yet to dig into this, but it does seem odd that it's mainly Chrome-only.

We also found that changing "experimental quic protocol" from "default" to "disabled" in Chrome fixes the problem btw. (It surprised me that it was enabled by default in the release version of Chrome, despite being experimental.)

Image

jph00 avatar Apr 19 '25 19:04 jph00

I tried upgrading from Caddy v2.8.4 to v2.9.0, v2.9.1, and v2.10.0, and the problem with HTTP/3 connections becoming unusable/hanging (pending in browser) after a config update persists across all versions. This problem is only observed in Chrome

Disabling the QUIC protocol or removing HTTP/3 (h3) from the Caddy configuration resolves the issue.

Is this being actively investigated? Are there any updates on potential fixes or patches?

sn4dder avatar Aug 07 '25 14:08 sn4dder

I'm a bit swamped and haven't noticed this issue for myself yet. Can anyone (esp. anyone experiencing the issue) help look into it?

mholt avatar Aug 07 '25 23:08 mholt

I had disabled HTTP/3 due to the same issue, but it seems to have been fixed in version 2.10.1.

  • caddyhttp: Free up quic listener when stopping (https://github.com/caddyserver/caddy/pull/7177)

After updating, I verified that it works perfectly.

athene20 avatar Aug 26 '25 00:08 athene20

I was hoping that would be the case. 😃 We have @WeidiDeng to thank for that!

mholt avatar Aug 26 '25 02:08 mholt

Confirmed fixed for me in v2.10.1 with Chrome 🥳

Interestingly, I still see the downgrade-to-HTTP/2 behaviour in Firefox whenever there is a server config change. The fallback seems to happen transparently and without delay now though, so I would say this is not an issue as it is not visible to users.

timmclean avatar Aug 26 '25 22:08 timmclean

@timmclean You need to contact firefox team as why h2 fallback is used in this case. This is browser specific behavior that we can not control. Browsers have their own preferences as whether to use h3 instead of h2/1.

WeidiDeng avatar Aug 27 '25 12:08 WeidiDeng