aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Error and restart of pod
Describe the bug
E0121 19:53:01.825912 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": http2: client connection lost
error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded
A spike in memory and container restart
Steps to reproduce
Expected outcome No restart
Environment
- AWS Load Balancer controller version 2.4.0
- Kubernetes version 1.25
- Using EKS (yes/no), if so version? yes eks.12
Additional Context:
@omfurman-ma Could you please share the steps to reproduce this issue and also share the controller logs around the time you saw this issue so that we can understand more about it?
@shraddhabang shraddhabang We saw these logs:
....
controller-runtime.webhookmsg:serving webhook server
...
time:2024-01-22T13:14:48.662328615Zstream:stderrmessage:W0122 13:14:48.662211 1 reflector.go:436] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.IngressClass ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Unexpected watch close - watch lasted less than a second and no items received
2024-01-22T13:14:48.738491365Zstream:stderrmessage:W0122 13:14:48.738396 1 reflector.go:436] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.Pod ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Unexpected watch close - watch lasted less than a second and no items received
2024-01-22T13:14:48.804215159Zstream:stderrmessage:E0122 13:14:48.804128 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://XXXX:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": dial tcp XXXX:443: connect: connection refused
2024-01-22T13:15:10.85823938Zstream:stderrmessage:E0122 13:15:10.858155 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://XXXX:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded
AS for steps for reproduce, we don't know, its not something that happened before. Just randomnly happened, on one cluster. then a day later on a different cluster, and the logs here are on a third cluster.
Just had it again:
message:E0125 12:52:28.540797 1 leaderelection.go:361] Failed to update lock: Put "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded
msg:problem running manager
And then a restart happened
@omfurman-ma, hey, how many controller pods do you have? It doesn't look like a controller issue to me, looks like some connection issues to the API server. Was there any upgrade/restart happened before this issue occured?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale