percona-server-mongodb-operator
percona-server-mongodb-operator copied to clipboard
fix ExternalTrafficPolicy for different service types
Problem: The ExternalTrafficPolicy for services of type LoadBalancer and NodePort was incorrectly set. For LoadBalancer services, the policy should be Local to ensure external connections reach the node where the service is running. However, it was incorrectly set to Cluster. Similarly, NodePort services had ExternalTrafficPolicy set to Local, when it should be Cluster for proper internal routing.
Cause: The wrong ExternalTrafficPolicy was applied for NodePort and LoadBalancer services, causing inefficient routing for external traffic.
Solution: The ExternalTrafficPolicy for NodePort services has been updated to Cluster, and for LoadBalancer services to Local, ensuring proper traffic flow for both service types.
Hi @clsv, maybe having these settings flexible is a better idea. Users will be able to change them for their needs. Like, users will be able to change it via CR https://github.com/percona/percona-server-mongodb-operator/blob/main/deploy/cr.yaml#L227 because now it is impossible.
Hi @hors, i encountered this issue yesterday in version 1.17.0, and I had to create a deployment with a kubectl patch for services of type NodePort. While I agree that having flexible settings is a good idea, I still believe it would be preferable to have more expected default behavior. When I request a LoadBalancer, I would expect that only the nodes running MongoDB would respond to external traffic.
@clsv Anton, hi! Thank you for your contribution. Did you meet any specific problem in this setup? Do you use NodePort? In which configuration? I try to understand your use cases and what could be the best solution for most of our users.
@nmarukovich Hi, yes, I encountered a problem with the default setup, so I had to create a patcher for the service. For external connections to the MongoDB replica set, we should use the splitHorizons option, where we specify which ip and port to use for each replica set node. When ExternalTrafficPolicy is set to Local, if a pod is scheduled on another node, I won’t be able to connect to MongoDB.
@nmarukovich Hi, yes, I encountered a problem with the default setup, so I had to create a patcher for the service. For external connections to the MongoDB replica set, we should use the splitHorizons option, where we specify which ip and port to use for each replica set node. When ExternalTrafficPolicy is set to Local, if a pod is scheduled on another node, I won’t be able to connect to MongoDB.
I see. In this case it looks reasonable to use Cluster for both NodePort and LoadBalancer (as default value). If I understand correctly how external LoadBalancer works we can meet the same problem for it too. Does it make sense for you?
And for sure we need to make this value configurable (I can do it in separate PR if it is faster).
If I understand correctly how external LoadBalancer works we can meet the same problem for it too.
Usually, when we create a LoadBalancer, the cloud provider sets up health checks for specific ports. When the NodePort responds only on the node where the specific pod is running, it is good for the service overall.
If I understand correctly how external LoadBalancer works we can meet the same problem for it too.
Usually, when we create a LoadBalancer, the cloud provider sets up health checks for specific ports. When the NodePort responds only on the node where the specific pod is running, it is good for the service overall.
Sorry, I didn't catch your idea and why externaltrafficPolicy: Local for serviceType LoadBalancer will be better for general cases. Could you please add a bit more details?
Sorry, I didn't catch your idea and why externaltrafficPolicy: Local for serviceType LoadBalancer will be better for general cases.
When externalTrafficPolicy is set to Local, the LoadBalancer only sends traffic to nodes with active pods for that service. This improves performance because the LoadBalancer won’t direct traffic to nodes without available pods. Traffic is routed only to pods on the same node, minimizing unnecessary cross-node network traffic and potentially reducing latency, which is beneficial in high-traffic environments and we don't do unnecessary health checks to all cluster nodes.
However, for NodePort, externalTrafficPolicy: Local presents a significant problem, as it can lead to connectivity issues for clients when working with a MongoDB replica set. If a pod moves to a different node, the NodePort will no longer respond on the original node specified in the connection string. This can cause client applications to lose connection to the MongoDB replica set if they try to connect to a node without an active MongoDB pod.
The client’s source IP is preserved only with ExternalTrafficPolicy: Local. This is very important from the administrator's perspective (e.g. for query analytics). For the case you mention about NodePort with externalTrafficPolicy: Local, as long as you configure multiple nodes in the connection string it should not be a problem to lose one of them (the driver should take care of connecting to another one). So from my perspective I am inclined to just have ExternalTrafficPolicy: Local everywhere.
For the case you mention about NodePort with externalTrafficPolicy: Local, as long as you configure multiple nodes in the connection string it should not be a problem to lose one of them (the driver should take care of connecting to another one).
For example, if you have 4 nodes and 3 pods, with 3 services for the replica set members rs-0, rs-1, and rs-2 using NodePorts 32017, 32018, and 32019 and externalTrafficPolicy: Local, in splitHorizons we configure it as follows: mongo-rs0-0: external: node-01:32017 mongo-rs0-1: external: node-02:32018 mongo-rs0-2: external: node-03:32019 When I connect to the NodePort, I should use the following connection string: mongodb://node-01:32017,node-02:32018,node-03:32019/. MongoDB will only work if mongo-rs0-0 runs on node-01, mongo-rs0-1 on node-02, and mongo-rs0-2 on node-03. If the pods run on a different node configuration, it won't work; similarly, if a pod is scheduled on node-04, it also won't work.
If you think this is okay, let's close the PR. I don’t understand why we've been discussing obvious things for several days...
ahh I missed the case of splitHorizon. I think the PR makes sense then.
@clsv sorry for such a long discussion. Thanks for you clarification. PR approved. Waiting for the tests.
We will never merge this pull request 😅
We will never merge this pull request 😅
@clsv this is waiting for my approval. I'll review today.
service-per-pod test fails because crVersion is still set to 1.18.0 in version/version.go and deploy/cr.yaml.
@clsv you can set it to 1.19.0 to ensure all tests are passing or you can wait a few days, we'll be releasing v1.18.0 this week and update versions in main branch to 1.19.0.
| Test name | Status |
|---|---|
| arbiter | passed |
| balancer | passed |
| custom-replset-name | passed |
| custom-tls | passed |
| custom-users-roles | passed |
| custom-users-roles-sharded | passed |
| cross-site-sharded | passed |
| data-at-rest-encryption | passed |
| data-sharded | passed |
| demand-backup | passed |
| demand-backup-fs | passed |
| demand-backup-eks-credentials | passed |
| demand-backup-physical | passed |
| demand-backup-physical-sharded | passed |
| demand-backup-sharded | passed |
| expose-sharded | passed |
| ignore-labels-annotations | passed |
| init-deploy | passed |
| finalizer | passed |
| ldap | passed |
| ldap-tls | passed |
| limits | passed |
| liveness | passed |
| mongod-major-upgrade | passed |
| mongod-major-upgrade-sharded | passed |
| monitoring-2-0 | passed |
| multi-cluster-service | passed |
| non-voting | passed |
| one-pod | passed |
| operator-self-healing-chaos | passed |
| pitr | passed |
| pitr-sharded | passed |
| pitr-physical | passed |
| pvc-resize | passed |
| recover-no-primary | passed |
| replset-overrides | passed |
| rs-shard-migration | passed |
| scaling | passed |
| scheduled-backup | passed |
| security-context | passed |
| self-healing-chaos | passed |
| service-per-pod | passed |
| serviceless-external-nodes | passed |
| smart-update | passed |
| split-horizon | passed |
| storage | passed |
| tls-issue-cert-manager | passed |
| upgrade | passed |
| upgrade-consistency | passed |
| upgrade-consistency-sharded-tls | passed |
| upgrade-sharded | passed |
| users | passed |
| version-service | passed |
| We run 53 out of 53 |
commit: https://github.com/percona/percona-server-mongodb-operator/pull/1673/commits/714b1d67f7eeaa61d97f70169267368bb0598467
image: perconalab/percona-server-mongodb-operator:PR-1673-714b1d67
@clsv, thank you for your contribution