percona-server-mongodb-operator icon indicating copy to clipboard operation
percona-server-mongodb-operator copied to clipboard

fix ExternalTrafficPolicy for different service types

Open clsv opened this issue 1 year ago • 4 comments

Problem: The ExternalTrafficPolicy for services of type LoadBalancer and NodePort was incorrectly set. For LoadBalancer services, the policy should be Local to ensure external connections reach the node where the service is running. However, it was incorrectly set to Cluster. Similarly, NodePort services had ExternalTrafficPolicy set to Local, when it should be Cluster for proper internal routing.

Cause: The wrong ExternalTrafficPolicy was applied for NodePort and LoadBalancer services, causing inefficient routing for external traffic.

Solution: The ExternalTrafficPolicy for NodePort services has been updated to Cluster, and for LoadBalancer services to Local, ensuring proper traffic flow for both service types.

clsv avatar Oct 08 '24 15:10 clsv

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 08 '24 15:10 CLAassistant

Hi @clsv, maybe having these settings flexible is a better idea. Users will be able to change them for their needs. Like, users will be able to change it via CR https://github.com/percona/percona-server-mongodb-operator/blob/main/deploy/cr.yaml#L227 because now it is impossible.

hors avatar Oct 09 '24 11:10 hors

Hi @hors, i encountered this issue yesterday in version 1.17.0, and I had to create a deployment with a kubectl patch for services of type NodePort. While I agree that having flexible settings is a good idea, I still believe it would be preferable to have more expected default behavior. When I request a LoadBalancer, I would expect that only the nodes running MongoDB would respond to external traffic.

clsv avatar Oct 09 '24 12:10 clsv

@clsv Anton, hi! Thank you for your contribution. Did you meet any specific problem in this setup? Do you use NodePort? In which configuration? I try to understand your use cases and what could be the best solution for most of our users.

nmarukovich avatar Oct 27 '24 21:10 nmarukovich

@nmarukovich Hi, yes, I encountered a problem with the default setup, so I had to create a patcher for the service. For external connections to the MongoDB replica set, we should use the splitHorizons option, where we specify which ip and port to use for each replica set node. When ExternalTrafficPolicy is set to Local, if a pod is scheduled on another node, I won’t be able to connect to MongoDB.

clsv avatar Oct 28 '24 09:10 clsv

@nmarukovich Hi, yes, I encountered a problem with the default setup, so I had to create a patcher for the service. For external connections to the MongoDB replica set, we should use the splitHorizons option, where we specify which ip and port to use for each replica set node. When ExternalTrafficPolicy is set to Local, if a pod is scheduled on another node, I won’t be able to connect to MongoDB.

I see. In this case it looks reasonable to use Cluster for both NodePort and LoadBalancer (as default value). If I understand correctly how external LoadBalancer works we can meet the same problem for it too. Does it make sense for you?

And for sure we need to make this value configurable (I can do it in separate PR if it is faster).

nmarukovich avatar Oct 28 '24 12:10 nmarukovich

If I understand correctly how external LoadBalancer works we can meet the same problem for it too.

Usually, when we create a LoadBalancer, the cloud provider sets up health checks for specific ports. When the NodePort responds only on the node where the specific pod is running, it is good for the service overall.

clsv avatar Oct 28 '24 12:10 clsv

If I understand correctly how external LoadBalancer works we can meet the same problem for it too.

Usually, when we create a LoadBalancer, the cloud provider sets up health checks for specific ports. When the NodePort responds only on the node where the specific pod is running, it is good for the service overall.

Sorry, I didn't catch your idea and why externaltrafficPolicy: Local for serviceType LoadBalancer will be better for general cases. Could you please add a bit more details?

nmarukovich avatar Oct 28 '24 14:10 nmarukovich

Sorry, I didn't catch your idea and why externaltrafficPolicy: Local for serviceType LoadBalancer will be better for general cases.

When externalTrafficPolicy is set to Local, the LoadBalancer only sends traffic to nodes with active pods for that service. This improves performance because the LoadBalancer won’t direct traffic to nodes without available pods. Traffic is routed only to pods on the same node, minimizing unnecessary cross-node network traffic and potentially reducing latency, which is beneficial in high-traffic environments and we don't do unnecessary health checks to all cluster nodes.

However, for NodePort, externalTrafficPolicy: Local presents a significant problem, as it can lead to connectivity issues for clients when working with a MongoDB replica set. If a pod moves to a different node, the NodePort will no longer respond on the original node specified in the connection string. This can cause client applications to lose connection to the MongoDB replica set if they try to connect to a node without an active MongoDB pod.

clsv avatar Oct 28 '24 21:10 clsv

The client’s source IP is preserved only with ExternalTrafficPolicy: Local. This is very important from the administrator's perspective (e.g. for query analytics). For the case you mention about NodePort with externalTrafficPolicy: Local, as long as you configure multiple nodes in the connection string it should not be a problem to lose one of them (the driver should take care of connecting to another one). So from my perspective I am inclined to just have ExternalTrafficPolicy: Local everywhere.

igroene avatar Oct 29 '24 17:10 igroene

For the case you mention about NodePort with externalTrafficPolicy: Local, as long as you configure multiple nodes in the connection string it should not be a problem to lose one of them (the driver should take care of connecting to another one).

For example, if you have 4 nodes and 3 pods, with 3 services for the replica set members rs-0, rs-1, and rs-2 using NodePorts 32017, 32018, and 32019 and externalTrafficPolicy: Local, in splitHorizons we configure it as follows: mongo-rs0-0: external: node-01:32017 mongo-rs0-1: external: node-02:32018 mongo-rs0-2: external: node-03:32019 When I connect to the NodePort, I should use the following connection string: mongodb://node-01:32017,node-02:32018,node-03:32019/. MongoDB will only work if mongo-rs0-0 runs on node-01, mongo-rs0-1 on node-02, and mongo-rs0-2 on node-03. If the pods run on a different node configuration, it won't work; similarly, if a pod is scheduled on node-04, it also won't work.

If you think this is okay, let's close the PR. I don’t understand why we've been discussing obvious things for several days...

clsv avatar Oct 29 '24 18:10 clsv

ahh I missed the case of splitHorizon. I think the PR makes sense then.

igroene avatar Oct 29 '24 19:10 igroene

@clsv sorry for such a long discussion. Thanks for you clarification. PR approved. Waiting for the tests.

nmarukovich avatar Oct 30 '24 09:10 nmarukovich

We will never merge this pull request 😅

clsv avatar Nov 06 '24 20:11 clsv

We will never merge this pull request 😅

@clsv this is waiting for my approval. I'll review today.

egegunes avatar Nov 11 '24 10:11 egegunes

service-per-pod test fails because crVersion is still set to 1.18.0 in version/version.go and deploy/cr.yaml.

@clsv you can set it to 1.19.0 to ensure all tests are passing or you can wait a few days, we'll be releasing v1.18.0 this week and update versions in main branch to 1.19.0.

egegunes avatar Nov 12 '24 07:11 egegunes

Test name Status
arbiter passed
balancer passed
custom-replset-name passed
custom-tls passed
custom-users-roles passed
custom-users-roles-sharded passed
cross-site-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-fs passed
demand-backup-eks-credentials passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
ignore-labels-annotations passed
init-deploy passed
finalizer passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 passed
multi-cluster-service passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-sharded passed
pitr-physical passed
pvc-resize passed
recover-no-primary passed
replset-overrides passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 53 out of 53

commit: https://github.com/percona/percona-server-mongodb-operator/pull/1673/commits/714b1d67f7eeaa61d97f70169267368bb0598467 image: perconalab/percona-server-mongodb-operator:PR-1673-714b1d67

JNKPercona avatar Nov 12 '24 11:11 JNKPercona

@clsv, thank you for your contribution

hors avatar Nov 12 '24 12:11 hors