cluster-operator icon indicating copy to clipboard operation
cluster-operator copied to clipboard

support for service.spec.trafficDistribution

Open nejec opened this issue 8 months ago • 3 comments

Is your feature request related to a problem? Please describe.

I would like to use experimental feature of kubernetes 1.31 trafficDistribution, which uses routing the traffic to the nearest cluster node (e.g. in the same availability zone).

Describe the solution you'd like

To avoid cross availability zone traffic cost kubernetes offer several methods how to send traffic to service endpoint in the same availability zone (or node):

  • service.kubernetes.io/topology-mode: Auto with Topology Aware Routing, which has a drawback that it does not handle failover if the cluster node in the same zone goes down.
  • service.spec.trafficDistribution: PreferClose
  • Istio VirtualServer

Good explanation of the problem is available here.

Describe alternatives you've considered

For the time being using service.kubernetes.io/topology-mode: Auto can be used, however it has some drawbacks and it will be phased out in favour of trafficDistribution.

Using Istio for this specific cause adds another moving part, which is actually not needed since kubernetes already support the feature, however it is not implemented in the operator.

Additional context

I have looked into the code and since i am not a Golang programmer, i cannot make really good code for this. I can prepare something (but it will be mostly AI generated), however i don't expect for my code would be appropriate for such big project. But i can try ;)

nejec avatar Apr 04 '25 18:04 nejec

I'm not very keen in adding support for an experimental feature, because the operator doesn't know the Kubernetes version of the API server, and using the feature in earlier Kubernetes versions would accept the RabbitmqCluster spec and fail during reconcile. That is not a great UX.

I believe that this traffic distribution feature can be used with the override feature for the service: https://www.rabbitmq.com/kubernetes/operator/using-operator#override

Zerpet avatar Apr 15 '25 16:04 Zerpet

@Zerpet Yes, you are correct. The UX is a bit tricky to handle for these kind of changes, since it would be really great if operator could support wide spectrum of kubernetes version. But i totally understand your point.

At the moment, override does not work since it seems TrafficDistribution definition is behind a feature gate. There probably is a possible way around that, but i am unaware of it.

nejec avatar Apr 15 '25 16:04 nejec

At the moment, override does not work since it seems TrafficDistribution definition is behind a feature gate. There probably is a possible way around that, but i am unaware of it.

The feature gate is unrelated to the override. You have to enable the feature in Kubernetes 1.30, since it's alpha and disabled by default. According to feature gates documentation, the traffic distribution graduated to GA in Kubernetes 1.33. We usually let a few Kubernetes minors pass before supporting such new features in the Operator. In any case, if you deploy a Kubernetes 1.32 or 1.33, you should be able to use the traffic distribution using the service override.

Zerpet avatar May 05 '25 16:05 Zerpet

This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring.

github-actions[bot] avatar Jul 05 '25 00:07 github-actions[bot]

Closing stale issue due to further inactivity.

github-actions[bot] avatar Aug 04 '25 00:08 github-actions[bot]