aws-load-balancer-controller Error and restart of pod

Describe the bug

E0121 19:53:01.825912 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": http2: client connection lost

error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded

A spike in memory and container restart

Steps to reproduce

Expected outcome No restart

Environment

AWS Load Balancer controller version 2.4.0
Kubernetes version 1.25
Using EKS (yes/no), if so version? yes eks.12

Additional Context:

Jan 22 '24 15:01 omfurman-ma

@omfurman-ma Could you please share the steps to reproduce this issue and also share the controller logs around the time you saw this issue so that we can understand more about it?

Jan 24 '24 23:01 shraddhabang

@shraddhabang shraddhabang We saw these logs:

....
controller-runtime.webhookmsg:serving webhook server
...

time:2024-01-22T13:14:48.662328615Zstream:stderrmessage:W0122 13:14:48.662211 1 reflector.go:436] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.IngressClass ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Unexpected watch close - watch lasted less than a second and no items received


2024-01-22T13:14:48.738491365Zstream:stderrmessage:W0122 13:14:48.738396 1 reflector.go:436] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: watch of *v1.Pod ended with: very short watch: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Unexpected watch close - watch lasted less than a second and no items received

2024-01-22T13:14:48.804215159Zstream:stderrmessage:E0122 13:14:48.804128 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://XXXX:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": dial tcp XXXX:443: connect: connection refused


2024-01-22T13:15:10.85823938Zstream:stderrmessage:E0122 13:15:10.858155 1 leaderelection.go:325] error retrieving resource lock kube-system/aws-load-balancer-controller-leader: Get "https://XXXX:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded

AS for steps for reproduce, we don't know, its not something that happened before. Just randomnly happened, on one cluster. then a day later on a different cluster, and the logs here are on a third cluster.

Jan 25 '24 09:01 omfurman-ma

Just had it again:

message:E0125 12:52:28.540797 1 leaderelection.go:361] Failed to update lock: Put "https://xxxx:443/api/v1/namespaces/kube-system/configmaps/aws-load-balancer-controller-leader": context deadline exceeded

msg:problem running manager

And then a restart happened

Jan 25 '24 13:01 omfurman-ma

@omfurman-ma, hey, how many controller pods do you have? It doesn't look like a controller issue to me, looks like some connection issues to the API server. Was there any upgrade/restart happened before this issue occured?

Feb 12 '24 22:02 oliviassss

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 14 '24 21:05 k8s-triage-robot

Getting the same issue, my ALB Controller pod getting restarted with the same logged. @oliviassss FYI, there is no update/upgrade happened before this issue occurred. can you please to fix this issue in next release or suggest some of blockers in AWS EKS that could be the reason behind this issue?

Jun 10 '24 11:06 nileshgadgi

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 10 '24 11:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Aug 09 '24 12:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Aug 09 '24 12:08 k8s-ci-robot

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Error and restart of pod

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard