aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
AWS Load Balancer Controller keeps trying to reconcile the ALB with an non-existent WAF without alb.ingress.kubernetes.io/wafv2-acl-arn annotation on ingress resource
Describe the bug
AWS Load Balancer Controller keeps trying to reconcile waf-acl association to the provisioned ALB with the ingress, even though the WAFv2 and WAFv2 Web-Acls have been deleted and annotation alb.ingress.kubernetes.io/wafv2-acl-arn removed from the Ingress Resource.
Steps to reproduce
I am a CSE from AWS Support, I will explain the problem customer was having and steps I am trying to reproduce the issue but with no luck so far,
Customer explained that WAFv2 and WAFv2 Web-Acls have been deleted from their AWS Accounts weeks ago, and previously ingress resources for their EKS Cluster have the following annotation:
# alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:us-east-1:XXXXXXX:regional/webacl/xxx-waf-xxxx/xxxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxx
After commenting out the annotation for their ingress resources, customer noticed the following error on ingress, which was continuously being faced:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 7m39s (x665 over 6d9h) ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.
During our call we were not able to find the root cause, but we are able to find a mitigation for this error which was to upgrade AWS Load Balancer Controller to the latest version available: 2.6.2. Since version v2.5.3, AWS Load Balancer controller supports disabling the reconciliation process for waf-acl association to the provisioned ALB with the ingress. Once disabled, the controller shall not take any actions on the waf addons of the provisioned ALBs.
We performed the following steps to upgrade the AWS Load Balancer Controller to the latest version and add the flags to disabling the reconciliation process for waf-acl association:
Step 1: Update AWSLoadBalancerControllerIAMPolicy to the latest version available: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.6.2/docs/install/iam_policy.json Step 2: Upgrade AWS Load Balancer Controller using Helm Upgrade Command:
helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=my-cluster \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
--set region=region-code
--set vpcId=vpc-xxxxxxxx
Step 3: Add the flags below on AWS Load Balancer Controler Deployment spec.template.spec.containers.args
- --enable-waf=false
- --enable-wafv2=false
Once the flags have been added, we confirmed Ingress Resources showed Reconcilition succeed.
ingress Successfully reconciled
However, we tested removing the flags and the problem started again:
ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.
Even though there is no more annotation on the ingress for alb.ingress.kubernetes.io/wafv2-acl-arn
Steps I tried to reproduce the issue:
Step 1: Install AWS load balancer Controller on version 2.6.2
Step 2: Install minimal ingress with the annotation alb.ingress.kubernetes.io/wafv2-acl-arn for an non-existent WAF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minimal-ingress
namespace: fargate
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:ap-southeast-2:XXXXXXXXXX:regional/webacl/fake-waf/XXXXX-XXXXX-XXX-XXXX-XXXXX
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
service:
name: test
port:
number: 80
And I could see the error:
Warning FailedDeployModel 15s ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.
Step 3: Then I removed the annotation from the ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minimal-ingress
namespace: fargate
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /testpath
pathType: Prefix
backend:
service:
name: test
port:
number: 80
And I could see the reconcilitation succeed
Normal SuccessfullyReconciled 3s (x2 over 6m21s) ingress Successfully reconciled
Customer has ingress running on fargate, I tried the above in either a fargate namespace and default namespace and got the same results. This is not specific to a particular EKS Cluster or AWS Environment, given the customer is facing this problem across their dev, stage and prod eks Clusters which runs across different AWS Accounts.
I would appreciate if anyone from AWS Load Balancer Controller team can provide any recommendation about why on the customer environment AWS Load Balancer Controller keeps trying to reconcile the ALB with an non-existent WAF and also without alb.ingress.kubernetes.io/wafv2-acl-arn annotation on ingress resource?
Expected outcome Given there is no annotation alb.ingress.kubernetes.io/wafv2-acl-arn on ingress resource anymore, we would expected that without using the below flags on AWS Load Balancer Controler Deployment spec.template.spec.containers.args
- --enable-waf=false
- --enable-wafv2=false
The reconciliation succeed.
Environment
- AWS Load Balancer controller version
- 2.6.2
- Kubernetes version
- Using EKS (yes/no), if so version? 1.28 Additional Context: I am a CSE from AWS Support, please reach out to me internally if you want to discuss anything privately.
@maiconrocha, would you be able to share the logs when cx hit this issue in the older version? You can send out to me internally if not able to share in the gh. Thanks.
Thanks @oliviassss I don't have the full logs, I asked the customer to monitor this thread so they can provide any further logs if required. Are you interested in AWS Load Balancer Controller logs or something else? please note, customer is also facing the issue on the latest version 2.6.2 when we removed the flags
- --enable-waf=false
- --enable-wafv2=false and the problem started again:
ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.
Once the flags have been re-added, we confirmed Ingress Resources showed Reconcilition succeed.
ingress Successfully reconciled
@maiconrocha, I'll try to reproduce, yes the controller logs will also help us to further investigate. Thanks
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.