ephemeral-port-reserve
ephemeral-port-reserve copied to clipboard
FYI see also python_portpicker (PyPI portpicker) and the concept of a portserver
https://github.com/google/python_portpicker/
The additional feature it provides is integration with the concept of a portserver (it provides an example portserver) when one is present that manages the lifetime of port reservations on a host based on the requesting process existing. Scenario: Run a portserver on your CI/testing hosts and developer workstations and you can even avoid random available port conflicts when multiple things are seeking out free ports at once before they get bound.
The TIME_WAIT trick this code does and typical Linux FIN default timeout may mean that doesn't often happen if ever in some situations... but as hosts get huge and have hundreds fo threads all simultaenously running integration tests there's only so much room in the 16-bit port namespace.
thanks though such a thing shouldn't be necessary even in a large system. the TIME_WAIT ensures the port is taken and won't be released within the timeout (try for instance running this tool in a loop, you'll quickly reach exhaustion)
We run portservers everywhere. Our automation systems have a notably lower than 60 FIN timeout. It is not impossible for something to take longer than tcp_fin_timeout seconds before that component has started up and bound to the port. Our test infrastructure is huge enough and busy enough that it happens at our scale. =) Thus the portserver.
(regardless, I do like your TIME_WAIT state trick)
FYI digging a little further, tcp_fin_timeout is not the TIME_WAIT duration in Linux, it just happens to default to the same value, but changing tcp_fin_timeout does not change the TIME_WAIT constant. So indeed it seems to be 60 seconds. https://github.com/torvalds/linux/blob/master/include/net/tcp.h#L123
Leaving exhaustion as the only downside (which a server could lead to anyways depending on how it made use of its serving sockets).
The code you pointed to has as its next line #define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN. A brief search shows that TCP_FIN_TIMEOUT is the initial value of net->ipv4.sysctl_tcp_fin_timeout , so I'm not convinced this is coincidental.
What made you lower your FIN timeout times? The only reason I can think of is port exhaustion related to TIME_WAIT, but I can't think of any non-pathological reason that might happen. I think I would personally leave the timeout at 60s and view FIN_TIMEOUT errors as a natural load balancer -- if the machine is so congested that the workload can't even bind its port, then it should give up and be rescheduled elsewhere anyhow.