vegeta icon indicating copy to clipboard operation
vegeta copied to clipboard

Add optional feature to set the SO_REUSEADDR option before binding th…

Open fabiorush opened this issue 2 years ago • 8 comments

Background

Add an optional parameter to set the SO_REUSEADDR option on the TCP socket before binding it (default false). Can reduce the amount of "bind: address already in use" errors when doing many connections with the same address and port combination, like the ones that happened on https://github.com/tsenart/vegeta/issues/583.

Checklist

  • [x] Git commit messages conform to community standards.
  • [x] Each Git commit represents meaningful milestones or atomic units of work.
  • [ ] Changed or added code is covered by appropriate tests.

fabiorush avatar Jul 25 '23 15:07 fabiorush

Hey, thanks for the patch. I have couple question.

  1. Does this work on every OS/architecture pair we ship releases to? If not, that's fine. We just need to document it properly and make sure this only runs on the supported ones. See the latest release for the supported targets: https://github.com/tsenart/vegeta/releases/tag/v12.11.0
  2. Are there any risks / trade-offs involved with setting this on outgoing connections? I've used this when binding to a port to receive incoming requests but never to outgoing connections.

tsenart avatar Jul 25 '23 17:07 tsenart

Here's what GPT4 had to say about this: https://chat.openai.com/share/b515e3dc-6193-4550-8d5d-3ecaa8088755

tsenart avatar Jul 25 '23 17:07 tsenart

Yeah, I'm aware of the risks and the follow article explain it really well:

https://hea-www.harvard.edu/~fine/Tech/addrinuse.html

I think the main risk is to have a socket A receiving packets that should be received by socket B, but I think it doesn't matter much for the load test itself (it would matter for the report in the end, but I think we could add a note about that).

fabiorush avatar Jul 25 '23 18:07 fabiorush

And this post on stackoverflow talks about the different implementation of the REUSEADDR and REUSEPORT flags on many architectures. In resume the REUSEADDR flag stays the same, while we have some minor variations on the REUSEPORT flag.

https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ

fabiorush avatar Jul 25 '23 18:07 fabiorush

About the suggestions made by GPT4, something we cannot or are not allowed to modify sys net ipv4 parameters. For example, decreasing the tcp_fin_timeout will also affect other services, which may not be desirable.

fabiorush avatar Jul 25 '23 20:07 fabiorush

I need to convince myself that this is safe, and so far I haven't been able to. Maybe you can help me get there. If someone runs into "bind: address already in use" even given the fact that we're using HTTP keep alive underneath and re-using TCP connections, it like means the server is very slow to respond, or the network really faulty, or a mix. In either case, the TCP connection would still be open, and the client reading a response, so re-using the local address seems unsafe?

Or is it only applicable when the underlying TCP connection is in TIME_WAIT state? In any case, I'd like to write some sort of integration test that exercises this and gives us some more confidence.

tsenart avatar Jul 30 '23 15:07 tsenart

Exactly. On our use case we were executing a load test to a load balancer composed of 8 front end servers. Those servers queries a route table located on another server and do a proxy_pass to one internal server out of 20. We were trying to find out what happens when the communication to the route table service becomes unresponsible. Because how the retries were implemented a request that took milliseconds to be responded went to up 5 seconds. We wanted to know if the front end servers would crash or also become unresponsible but we started to runs into "bind: address already in use". The setting of the reuse option fixed this on our case. It only allows the reuse of TIME_WAIT sockets when you are binding to different combination of remote IP address and port. So if you have a socket in TIME_WAIT with the combination LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_A:PORT_X and try the same combination again it will not allow, but if use a combination with another remote IP address or port like LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_B:PORT_X it will reuse the LOCAL_ADDR_A:PORT_A when binding.

fabiorush avatar Aug 01 '23 12:08 fabiorush

If you're hitting "bind: address in use" issues in your load tests, it typically means you're exhausting the capabilities of the network stack on the host. SO_REUSEADDR and SO_REUSEPORT won't meaningfully impact those constraints. And, in any case, it's a situation that you would — should! — never encounter in a normal deployment, so it doesn't make sense to try to accommodate in a vegeta load test.

edit: vegeta acts as a client, not a server, and SO_REUSEADDR/PORT on the client side is far more restrictive than on the server side. Specifically, it doesn't allow a single physical connection from a given client addr:port to a given server addr:port to mux arbitrary logical connections.

Instead, you want to make sure that vegeta creates no more than a reasonable number of connections to a given target host. The best way to do that is by setting -max-connections to a value like 8 or 16 or 32. And if that means you can't do as many RPS as you want, then you need to run vegeta on multiple hosts.

peterbourgon avatar Aug 04 '23 01:08 peterbourgon