TCP congestion control issues (e.g. retransmissions, out-of-order ACKs, dropped packets)
Caddy appears to have TCP_NODELAY enabled -- aka Nagle's algorithm disabled (the default in Go sockets and I don't see TCPConn.SetNoDelay(false) in the codebase), which can exacerbate issues when the path is under congestion and trying to send many small packets. TCP_NODELAY shouldn't be used for web servers, but for latency-sensitive applications such as SSE, SSH, VNC, or other services where throughput can be sacrificed.
When either the server's network, a network in-between, or the client's network (especially crowded WiFi networks or channels), is under extreme congestion, Caddy exacerbates the issue by using TCP_NODELAY, causing an extreme loss -- up to 99% -- of throughput and a bad experience for the end user.
This is mostly an issue when proxying or using CGI services in which Caddy can't control the buffer and ensure the packets are as full as possible (in other words, file_server appears to behave mostly well, though it does suffer greatly in the +90th percentile).
Interesting.
Would you be willing to help benchmark this to compare?
Sounds like a trivial change. We might need to add a new option to make this configurable, in case this causes a regression for someone, and they want to continue using the previous behaviour.
I'll have to look at the code, and Go's API for changing this; we don't directly have access to the TCP connection usually, I think, unless we hijack the request (e.g. WebSockets connection upgrade) but hopefully there's an easy way to configure this.
Would you be willing to help benchmark this to compare?
I'd love to, I also have a git repo w/ Docker Compose I used to get to the bottom of this. I'll spice up the readme and get it pushed to github tomorrow.
Based on this comment by John Nagle himself, it sounds like delayed ACK should be off by default, but usually Nagle's and delayed ACKs are on at the same time, causing problems. In other words, it seems like disabling delayed ACK (using TCP_QUICKACK) and using Nagle's algorithm should be the preferred default. But instead, Go disables Nagle's algorithm by default by enabling TCP_NODELAY. I've also read some trains of thought, though, that if you do your own buffering right, you don't need Nagle's algorithm either.
Go's API for NODELAY is quite simple: https://pkg.go.dev/net#TCPConn.SetNoDelay
I'd be open to changing this and making it configurable, but it should be backed by performance tests.
but it should be backed by performance tests.
It is coming soon! I'm using Nginx and its "tcp_nodelay" configuration to show how it affects poor networks (like wifi/mobile/wired) using Docker Compose and Pumba. Tentative results (I'm still compiling them) show that if you are targeting mobile/residential clients, you might want to have TCP_NODELAY disabled, while for wired connections (such as inside a DC or between servers) you may want TCP_NODELAY enabled.
Thanks. Russ Cox also commented and it seems like the Go team still believes that the current default is best -- which, as I read more into things, I think I agree. I'm open to making it configurable, however.
Maybe disabling delayed ACKs (using TCP_QUICKACK) is really what we need to do, but I'm not sure if that is available on every platform.
This seems to also align with Go's latest stance: https://brooker.co.za/blog/2024/05/09/nagle.html (quickly touched the delayed ACKs topic too).
Note that this only applies to TCP and is therefore limited to HTTP/1.0, HTTP/1.1 and HTTP/2. Most clients today will use HTTP/3, which isn't based on TCP.
Yeah. I just gave up on this since people don't seem to understand beyond surface-level what I'm even saying or just parrot some dogma.
The best solution for this is to simply not use Caddy (or Go, in general) on the edge if your clients might be on congested networks.
Yeah. I just gave up on this since people don't seem to understand beyond surface-level what I'm even saying or just parrot some dogma.
What? We're literally saying we'd "be open to changing this and making it configurable" and "I'm open to making it configurable."
I'd still like to see the performance tests that you said were "coming soon!" so that we can feel confident about and defend the change, if asked about it later.
The best solution for this is to simply not use Caddy (or Go, in general) on the edge if your clients might be on congested networks.
I mean, Google does. So does Netflix. And Stripe. And other huge websites. I guess you have more diverse traffic than they do?
Anyway, I guess I'll close this if interest has been lost.
What? We're literally saying we'd "be open to changing this and making it configurable" and "I'm open to making it configurable."
👍sorry. I wasn't referring to this issue specifically. More or less, I was referring to the general off-github feedback I got for even suggesting such a thing. So far, I've gotten four threatening emails. I've mostly just walked away from this issue as it isn't a hill worth dying on.
I mean, Google does. So does Netflix. And Stripe. And other huge websites. I guess you have more diverse traffic than they do?
No, I just live in a highly populated area with a congested wifi. And yeah, I regularly run into issues using those services, so I suspect that is a large reason why.
👍sorry. I wasn't referring to this issue specifically. More or less, I was referring to the general off-github feedback I got for even suggesting such a thing. So far, I've gotten four threatening emails. I've mostly just walked away from this issue as it isn't a hill worth dying on.
OH. Sorry. I thought somehow our discussion here was offensive or discouraging, but I understand now that's not what you meant. Really sorry to hear that.
Let us know if we can help in the future. :slightly_smiling_face:
FWIW, I believe Go has done their own implementation of Nagle's to ensure packets get filled (1.22?) instead of lots of tiny packets, but just more intelligently. And anyway, h3 is def the right answer here and Caddy is on the right path with that, for sure.