Support multiple outbound ports instead of only one 15001 to avoid local source port exhaustion
We hit an issue that the application has very high cps, >1000 cps. It's tcp traffic and intercepted by the outbound listener. But since there is only one port for the outbound listener, it freqently exhausted the local source port.
Here is the local_port_range:
kubectl exec apppod -n appns -c istio-proxy -- bash -c "sysctl net.ipv4.ip_local_port_range"
net.ipv4.ip_local_port_range = 32768 60999
And since only one outbound port 15001, the 5 tuple is about 28000 so the maximum cps is about 28000/60s =466 cps, the 60s here is the TIME_WAIT configured which means with 466 cps the local source port will be exhausted within 1m.
There are several options to optimize:
- Change the client to reuse connection
- Increase the local_port_range
- Reduce the TIME_WAIT duration or enable tcp_tw_reuse
From istio pespective, is it possible to add more outbound ports to support higher cps?
[ ] Ambient [ ] Docs [ ] Dual Stack [ ] Installation [ x] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure
Affected features (please put an X in all that apply)
[ ] Multi Cluster [ ] Virtual Machine [ ] Multi Control Plane
Additional context
Multiple outbound ports is tricky due to our iptables rules....are you saying the sidecar isn't reusing/pooling connections to the same 5-tuple?
Multiple outbound ports is tricky due to our iptables rules....are you saying the sidecar isn't reusing/pooling connections to the same 5-tuple?
The problem is not sidecar to upstream but the application to sidecar and there is no reusing/pooling in the application side.
For this case all the application outbound connections will be hijacked to 127.0.0.1:15001, so there is only one destinations ip/port, it is easily get source port exhaustion issue because of the local port range is from 32768 to 60999. Once the application has a high outbound CPS (470c/s), the local port will be used up and the tcp connection between app and sidecar will fail.
$cat /proc/sys/net/ipv4/ip_local_port_range
32768 60999
The TCP time_wait duration is 60s. So the max allowed CPS is about (60999-32768)/60=470
For now we changed the client to use keep alive but there is potential issue with single outbound port.
Related issue https://github.com/istio/istio/issues/38982 which is between gateway and sidecar.