istio icon indicating copy to clipboard operation
istio copied to clipboard

Support multiple outbound ports instead of only one 15001 to avoid local source port exhaustion

Open shonecyx opened this issue 1 year ago • 2 comments

We hit an issue that the application has very high cps, >1000 cps. It's tcp traffic and intercepted by the outbound listener. But since there is only one port for the outbound listener, it freqently exhausted the local source port.

Here is the local_port_range:

kubectl  exec apppod -n appns -c istio-proxy -- bash -c "sysctl net.ipv4.ip_local_port_range"
net.ipv4.ip_local_port_range = 32768      60999

And since only one outbound port 15001, the 5 tuple is about 28000 so the maximum cps is about 28000/60s =466 cps, the 60s here is the TIME_WAIT configured which means with 466 cps the local source port will be exhausted within 1m.

There are several options to optimize:

  1. Change the client to reuse connection
  2. Increase the local_port_range
  3. Reduce the TIME_WAIT duration or enable tcp_tw_reuse

From istio pespective, is it possible to add more outbound ports to support higher cps?

[ ] Ambient [ ] Docs [ ] Dual Stack [ ] Installation [ x] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster [ ] Virtual Machine [ ] Multi Control Plane

Additional context

shonecyx avatar May 23 '24 08:05 shonecyx

Multiple outbound ports is tricky due to our iptables rules....are you saying the sidecar isn't reusing/pooling connections to the same 5-tuple?

keithmattix avatar May 23 '24 14:05 keithmattix

Multiple outbound ports is tricky due to our iptables rules....are you saying the sidecar isn't reusing/pooling connections to the same 5-tuple?

The problem is not sidecar to upstream but the application to sidecar and there is no reusing/pooling in the application side.

For this case all the application outbound connections will be hijacked to 127.0.0.1:15001, so there is only one destinations ip/port, it is easily get source port exhaustion issue because of the local port range is from 32768 to 60999. Once the application has a high outbound CPS (470c/s), the local port will be used up and the tcp connection between app and sidecar will fail.

$cat /proc/sys/net/ipv4/ip_local_port_range
32768    60999

The TCP time_wait duration is 60s. So the max allowed CPS is about (60999-32768)/60=470

shonecyx avatar May 24 '24 01:05 shonecyx

For now we changed the client to use keep alive but there is potential issue with single outbound port.

shonecyx avatar May 30 '24 17:05 shonecyx

Related issue https://github.com/istio/istio/issues/38982 which is between gateway and sidecar.

shonecyx avatar Jul 03 '24 02:07 shonecyx