hcloud-cloud-controller-manager
hcloud-cloud-controller-manager copied to clipboard
Readiness gate support for 100% ingress HA
Please add support for Readiness Gates, so Ingress NGINX (and others) will have injected information if Hetzner Cloud already have target properly attached.
Right now, without that information when I redeploy NGINX to other nodes, it creates new Pods too fast on different nodes in comparizon to Hetzner and it just breaks ingress for half a minute until Hetzner reconfigured their load balancer which takes time.
When you create a Service with type: LoadBalancer
this happens:
- Kubernetes assigns a NodePort for every port in the service
- HCCM creates a new Load Balancer
- For every Port in the Kubernetes Service, HCCM creates a new "Service" in the Load Balancer, listening on the port and forwarding the traffic to the Node Port
- For every Node in the Kubernetes Cluster, HCCM creates a new "Target" in the Load Balancer, pointing at the Node/Server
Once the traffic is sent to the Node Port on any Kubernetes node, Kubernetes/CNI is now responsible for the traffic. Usually kube-proxy or your CNI then checks where Pods for the service are running, and forwards the traffic to one the pods. It does not matter on which Node the pod is running. If you have Pods marked as Ready in your Service, which are actually not ready to receive traffic, that would explain your problem.
@apricote ok, I'll explain problem with more detail because you don't understand what I mean :D
So, here is usecase:
- 4 nodes in cluster
- 1 nginx ingress running on any node
Status in Hetzner LoadBalancer is:
- 8 targets (80+443) - because of 4 nodes as backend with 2 ports each
- 2 targets healtly - because ingress-nginx works on 1 node with 2 ports each
Now, the problematic scenario:
- Run
kubectl -n ingress-nginx delete pod ingress-nginx-rsid-1234
- Kubernetes terminated nginx pod and starts it immiedately on another node which Pod state is
Running
andReady
. - Hetzner LB still has healtly state to old target (because of healthchecks with longer interval). website is down since now
- Going into website throws error because you cannot reach backend because LB points to old target.
- Hetzner LB runs few healthchecks over targets and finally decides that old target (on old node) is not healtly anymore and decides that new target (on new node) is now healtly.
- Website is up again after 30-60seconds.
This is real problem, the only solution for this is to run ingress-nginx as DaemonSet so Hetzner will always point to all nodes. Altough, recreating those pods will still have same problem.
Readiness Gate can inject information from external LB about it's state, so Kubernetes will be aware that new pod (on new node) is not ready yet (even though application started and is ready) until external loadbalancer will have it in healtly state.
Check this how it works in AWS: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/ especially usecase: "Rollout of new pods takes less time than it takes the AWS Load Balancer controller to register the new pods and for their health state turn »Healthy« in the target group"
2 targets healtly - because ingress-nginx works on 1 node with 2 ports each
How do you deploy ingress-nginx
? Usually Kubernetes routes traffic arriving on these ports to whatever Node currently has a matching Pod, so I would expect all targets to be healthy.
Looking at the AWS docs, I think they need this because they route directly from the Load Balancer to the Internal IP of the Pod (through ENI), which is not something that Hetzner Cloud Load Balancers do, they always target your Nodes.
@apricote I'm deploying NGINX as regular LoadBalancer
object, hence it exposes ports 80, 443 (and 22 for GitLab in my case) on Node Ports.
Also - to be honest - I would prefer my current behaviour over "Kubernetes accepting traffic and routing it everywhere", because then I have control over where my traffic goes from LB to nodes (for example if I would like to limit traffic from LB to node to be only in single region - I don't want traffic to go from Falkenstein to Helsinki because latency there is terrible 22ms!)
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/ingress-nginx-controller-76df64f546-h8bsj 1/1 Running 0 6d3h 10.244.4.218 kube-node-nbg1-cx31-1 <none> <none>
pod/ingress-nginx-controller-76df64f546-wcwnf 1/1 Running 0 6d3h 10.244.0.58 kube-master-1 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager-webhook-hetzner ClusterIP 10.97.14.83 <none> 443/TCP 59d
service/ingress-nginx-controller LoadBalancer 10.98.229.172 kube-lb.example.com 80:31246/TCP,443:30411/TCP,22:31811/TCP 59d
service/ingress-nginx-controller-admission ClusterIP 10.103.212.168 <none> 443/TCP 59d
Helm Config for my ingress-nginx:
controller:
service:
externalTrafficPolicy: Local
annotations:
external-dns.alpha.kubernetes.io/hostname: "*.k8s.trng.me,*.example.com"
load-balancer.hetzner.cloud/name: "kube"
load-balancer.hetzner.cloud/location: fsn1
load-balancer.hetzner.cloud/use-private-ip: "true"
load-balancer.hetzner.cloud/uses-proxyprotocol: "true"
# ExternalIP as hostname NEEDS TO BE set, otherwise
# hcloud-loadbalancer-controller will put here real extIP of LB and
# kube-proxy will route that traffic internall instead of to real LB,
# and traffic will be broken because non-proxied traffic will to into proxy-expected nginx
load-balancer.hetzner.cloud/hostname: kube-lb.example.com
config:
use-proxy-protocol: true
use-forwarded-headers: "true"
compute-full-forwarded-for: "true"
extraArgs:
default-ssl-certificate: "infra/tls-default"
watchIngressWithoutClass: false
ingressClassByName: false
ingressClassResource:
name: ohcs-nginx # default: nginx
#enabled: true
default: true
controllerValue: "k8s.io/ohcs-nginx" # default: k8s.io/ingress-nginx
replicaCount: 2
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
minReadySeconds: 30
resources:
requests:
cpu: 15m
memory: 128Mi
limits:
#cpu: 1
memory: 192Mi
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
- weight: 20
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
topologyKey: kubernetes.io/hostname
weight: 50
tcp:
22: "gitlab/gitlab-gitlab-shell:22"
Okay, externalTrafficPolicy: Local
stops Kubernetes from forwarding requests to a server with a matching Pod
and only answers the requests if the local node has a matching Pod.
for example if I would like to limit traffic from LB to node to be only in single region - I don't want traffic to go from Falkenstein to Helsinki because latency there is terrible 22ms!
You can do this with the load-balancer.hetzner.cloud/node-selector
annotation (added in 1.18.0). There you can specify any Node label selector, for example topology.kubernetes.io/region=fsn1
. Afterwards, only the nodes matching the label selector will be added to the Load Balancer.
@apricote I know I can use label, but... I don't want to use it because I still want to be able to use Helsinki as a last resort in case of failure of other regions ;) I think this makes sense.
Direct traffic from LB just makes more sense than random balancing which will be balanced again by Kube. Latency is the answer for "but why"...
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.