caddy
caddy copied to clipboard
reverseproxy: Close hijacked conns on reload/quit
Previously, if Caddy was proxying WebSockets, and the client closed the connection, Caddy would properly proxy the Close message to the server, and vice-versa if the server closed the connection to the client. This is all fine. However, we didn't really take care of cleaning up these hijacked connections in the event of a config reload (where Caddy needs to close the connection).
Hijacked connections don't get closed by the standard library when we call http.Server.Shutdown()
, so we have to do that ourselves.
We also send a Close control message to both ends of WebSocket connections. I have tested this many times in my dev environment with consistent success, although the variety of scenarios was limited.
Should close #4762.
This could use a lot of testing in production under heavy traffic. For example, I don't know if the 'graceful' WebSocket close feature works properly with concurrent write (i.e. the proxy is still actively copying a response from the backend while our proxy also injects its Close frame).
I thought of many ways to implement this, and I opted to start with the simplest that worked. Please try it out, and report any issues with it, thanks!
FYI @dunglas in case you have any input here regarding Server-Sent Events. Is it safe to just straight up close SSE connections, or should we be sending somekind of message to either end to terminate cleanly? I assume no because it's just a regular HTTP request, but I just want to make sure.
Just closing the connection is good enough, there is no need for a specific message. The client will automatically try to reconnect (which is the expected behavior).
Event stream requests can be redirected using HTTP 301 and 307 redirects as with normal HTTP requests. Clients will reconnect if the connection is closed; a client can be told to stop reconnecting using the HTTP 204 No Content response code.
I don't think sync.map delete usage is correct here, per issue and my personal experience with v2ray account management which uses sync.Map. If using delete to remove entry, memory will continue to grow and ultimately lead to oom. Once I stress tested v2ray by continuing adding 1k users and removing them, 2 or 3 rounds v2ray will oom, and with 30 minutes stops in between them to allow gc to kick in.
That github issue is closed with stating keys will be removed when calling loadanddelete. I never tried it, because I developed my own caddy plugin to handle proxy traffic.
@WeidiDeng Yikes, good to know. I'm not totally clear on the issue either, but given the documentation for sync.Map
and your experience and links you posted, I'm going to err on the safe side and choose to use a regular map
with sync.Mutex
instead.
This branch is ready for testing!
Recently, I am using forward_auth to develop my own authorization for ttyd, which uses websocket for communication. I use ttyd to manage server including caddy configuration. Since caddy will always restart all apps when configuration changes, I wonder in this case, closing websocket is actually unneccessary. But current caddy module lifecycle doesn't permit this case.
Yeah. We'd love to keep websocket connections open across config reloads, but the problem is that we don't have a way to relate things in the old config to the new config.
For example, what if the reverse_proxy config changes, e.g. upstreams change and the one the websocket connection was using is removed? We have to close the connection to not leak them. And since we have no way to identify a particular proxy in the config (you might have more than one proxy), we can't safely try to compare the old vs new upstreams. Also that wouldn't work for the dynamic upstreams feature either.
I see this doesn't follow caddyhttp.App's GracePeriod, perhaps when cleaning up, there need to be a delay. caddyhttp.ServerCtxKey can be used on a http.request to extract the relevant caddyhttp.App instance. And when 0, clean up immediately.
But this is semantically different from http app's GracePeriod, which when zero, means waiting for request to complete indefinitely.
@WeidiDeng
I see this doesn't follow caddyhttp.App's GracePeriod, perhaps when cleaning up, there need to be a delay.
True, we don't honor the grace period for hijacked connections. That's probably an area for improvement in a future PR. I am not sure how complicated that may be.
This has reportedly been working very well.