cloud-provider-openstack
                                
                                 cloud-provider-openstack copied to clipboard
                                
                                    cloud-provider-openstack copied to clipboard
                            
                            
                            
                        [occm]: introduce readiness gates for pods
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened:
Current LoadBalancer service implementation has a flaw: if during the deployment update a node's network is broken, but kubelet advertises that the pod is up and ready, the new deployment can cause an outage by shutting down old healthy pods.
What you expected to happen:
There should be a way to monitor pod's network readiness outside the k8s cluster using pods' readiness gates. See more details in this video: https://www.youtube.com/watch?v=Vw9GmSeomFg&t=289s
Anything else we need to know?:
In #1720 PR the externalTrafficPolicy: Local was introduced, which adds kube-proxy based monitors to LB pool members. We can keep this logic for OCCM configured without a router.
However for OCCM with a router configured, we can use service's endpoints instead of node ports and according to loadbalancer's member healthchecks patch the pods' readiness gates accordingly. This approach would increase a deployment update time (because of the LB healthchecks latency), but from other side this will increase the overall deployment accessibility. Additional set of advantages: more even traffic distribution between pods, and pod-based traffic affinity.
See also https://cloud.google.com/kubernetes-engine/docs/concepts/container-native-load-balancing#pod_readiness and https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/ descriptions. Unlike GCE and AWS dedicated ingress/LB controllers, I'd like to implement readiness gates feature directly in the OCCM controller for seamless feature toggle, especially for existing deployments.
Feel free to provide your suggestions or objections.
cc @databus23 @jichenjc @zetaab
can current OCCM loadbalancer implementation route traffic to service endpoints? I cannot see anything in code, so I think its not working (yet?).
Another thing that I am thinking: when you create octavia amphora loadbalancer, how you are going to add routes to that? Those vrrp ports are visible under ports, but not sure can you modify routes to those ones and how everything will even work. Can you add service endpoint (pod ip) as target to loadbalancer (because pod ip is not port in openstack, if you do not use things like kuryr). In that case it might be that you need also modify openstack port security policies
For me this looks like you are not running overlay network at all inside kubernetes clusters, using kuryr or similar and all pods are located in openstack network?
can current OCCM loadbalancer implementation route traffic to service endpoints? I cannot see anything in code, so I think its not working (yet?).
if routes a defined on the router configured for the private network, then traffic from loadbalancer to a particular pod CIDR is routed through a router to a corresponding node.
Another thing that I am thinking: when you create octavia amphora loadbalancer, how you are going to add routes to that?
routes are configured on the router, see above.
Can you add service endpoint (pod ip) as target to loadbalancer (because pod ip is not port in openstack, if you do not use things like kuryr).
yes, see above.
For me this looks like you are not running overlay network at all inside kubernetes clusters, using kuryr or similar and all pods are located in openstack network?
right. but this shouldn't be a requirement for direct pods routing.
However for OCCM with a router configured, we can use service's endpoints instead of node ports and according to loadbalancer's member healthchecks patch the pods' readiness gates accordingly.
so this is the key change proposed, correct? use svc's endpoint and if one pod is in a LB member's backend , when the LB monitor detect something wrong then the pod will be marked unhealthy?
@jichenjc correct
This sounds like something that has assumptions on the how the CNI would react - i.e. would it route traffic that comes to the node with a Pod IP into the actual Pod netns. This doesn't feel right when CNI does encapsulation. What CNI this proposal has in mind?
We use flannel in our setup:
# iptables-save | grep 100.101.0.0/24
-A POSTROUTING ! -s 100.101.0.0/16 -d 100.101.0.0/24 -m comment --comment "flanneld masq" -j RETURN
I'm not sure how does it work with other CNIs.
This doesn't feel right when CNI does encapsulation
good suggestion , Thanks for the reminder, if something like Calico does not work then we need re-think about how we can get this
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity, lifecycle/staleis applied
- After 30d of inactivity since lifecycle/stalewas applied,lifecycle/rottenis applied
- After 30d of inactivity since lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with /remove-lifecycle stale
- Close this issue with /close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity, lifecycle/staleis applied
- After 30d of inactivity since lifecycle/stalewas applied,lifecycle/rottenis applied
- After 30d of inactivity since lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with /remove-lifecycle rotten
- Close this issue with /close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
@dulek Is this related to your security group PR in any way? Are we happy letting this close? Do we have another solution?
/remove-lifecycle rotten
@mdbooth I still need this
@dulek Is this related to your security group PR in any way? Are we happy letting this close? Do we have another solution?
I don't see it as related to my work. If I understand the problem here correctly - it's about the LB members being "ONLINE" by default. I discussed that issue with Octavia folks at the OpenInfra Summit. So we can't really expect this to be implemented in Octavia as Octavia mimics the behavior of the hardware LBs which do it this way.
The solution they offered is to make the health monitor periods smaller. I bet that doesn't really work here?
Had we considered adding new members of the LBs as disabled and only enabling them after some timeout?
Had we considered adding new members of the LBs as disabled and only enabling them after some timeout?
good idea. I need to test it in my env.
Had we considered adding new members of the LBs as disabled and only enabling them after some timeout?
good idea. I need to test it in my env.
I won't be surprised if disabled ones aren't even evaluated by the health monitors, but please check that.
Also can you confirm that my understanding of the problem is correct? I'd love to find a viable solution to this.
Also can you confirm that my understanding of the problem is correct?
right
So this issue appears to be more critical than it was before. I made a number of tests and neither creating a member in a backup nor in a disabled state helped. And even worse: each member state update causes an outage until healthcheck verdict is updated.
Can you elaborate how the healthcheck system is defined for you then?
every member add/update action triggers the members healthcheck. if there is only 1/20 active member, the traffic will reach all 20 members until the healthcheck monitor verify all the pool members. default healthcheck monitor is 3 tries with 20 seconds delay -> 1 minute. thus the traffic within a minute will be forwarded to inactive members.
Wait, so when I add a member, Octavia/Amphora will reset states and consider all members to be healthy until all healthchecks are resolved? That sounds like an Octavia bug to me.
@dulek right. can you check this behavior in your env?
Another drawback of the externalTrafficPolicy: Local, removing a pod from an existing service will cause downtime, since LB pool member has a delay in healthchecks.
Example, you have 4 nodes with 4 pods (node anti-affinity) and a single service with the type: loadbalancer and externalTrafficPolicy: Local:
node1 -> pod1
node2 -> pod2
node3 -> pod3
node4 -> pod4
pool members will look like:
member1 -> node1 -> pod1
member2 -> node2 -> pod2
member3 -> node3 -> pod3
member4 -> node4 -> pod4
scaling down pod deployment to 3 will remove a pod replica from a node (e.g. node4), but due to LB helatcheck latency, there is a high chance that a new connection will be forwarded to the node4, which may cause a connection timeout:
member1 -> node1 -> pod1
member2 -> node2 -> pod2
member3 -> node3 -> pod3
member4 -> node4 -> X
The same problems will cause regular deployment updates, when a new pod is created and an old pod is removed.
So far I assume that readiness gate controller must be event-driven and proactively disable or mark offline the corresponding members, when a pod is being destroyed.
UPD: I found that the ProxyTerminatingEndpoints feature gate was added as beta in 1.26 https://github.com/kubernetes/kubernetes/issues/85643. The behavior with/without this feature is a bit different, but the traffic (new connection) in both cases are still get broken.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity, lifecycle/staleis applied
- After 30d of inactivity since lifecycle/stalewas applied,lifecycle/rottenis applied
- After 30d of inactivity since lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with /remove-lifecycle stale
- Close this issue with /close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity, lifecycle/staleis applied
- After 30d of inactivity since lifecycle/stalewas applied,lifecycle/rottenis applied
- After 30d of inactivity since lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with /remove-lifecycle rotten
- Close this issue with /close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity, lifecycle/staleis applied
- After 30d of inactivity since lifecycle/stalewas applied,lifecycle/rottenis applied
- After 30d of inactivity since lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with /reopen
- Mark this issue as fresh with /remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.