allowing endpoint-specific options for connection pooling
What this PR does / why we need it:
With the current implementation of ambient multi-cluster/multi-network, we have cases where requests sent to a remote cluster are not really well-distributed between the serving endpoints in the remote cluster. That happens because once a double-HBONE tunnel is stablished between a local waypoint and the remote E/W gateway, the E/W gateway is not able to identify different incoming requests in order to apply load balancing algorithms to them. A possible solution to that problem is to tweak connection pooling options in the waypoint, forcing it to create extra tunnels to the same remote E/W gateway allowing for better load balancing on the remote cluster. This problem is explained with more details here.
This PR is a way of mitigating this problem for the short term so users can have a better experience with ambient multi-cluster in beta. This PR introduces endpoint-specific options for connection pooling in Envoy. This way a local waypoint can have specific connection pooling options for remote E/W gateways.
The connection pooling option we're working with are max_streams_per_connection and max_concurrent_streams, which are options that are already available under Http2ProtocolOptions in Envoy. The first one limits the number of requests throughout the lifetime of a connection and the second limits the number of concurrent requests, in both cases creating extra connection if allowed by other global limits/policies.
IMPORTANT NOTES: Once a final solution for this problem is in place we can remove this mitigation without any damage to user-facing APIs. This mitigation is also being proposed to upstream Envoy here: https://github.com/envoyproxy/envoy/issues/42251, and once it's merged we will switch over to the upstream solution.
Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #58039
Special notes for your reviewer:
Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all
+1. There seems to be no agreement also on exact solution in Envoy
I don't think we should maintain patches on envoy here. This would be the first time we ever did this, and it doesn't seem sustainable to fork envoy like this. Is there any reason we cannot wait for it to merge upstream?
@howardjohn
We could wait, but that means releasing 1.29/ambient MC beta with this undesirable behaviour. The whole idea here is just to mitigate this problem until we get a proper solution upstream so users can enjoy ambient multi-cluster with a better experience in the short-time.
We could try to create an extension instead, but that's reasonably more complex than necessary. What do you think?
+1. There seems to be no agreement also on exact solution in Envoy @ramaraochavali
I'm creating a PR for that so we can accelerate the decision.