aws-load-balancer-controller AWS Load Balancer Controller keeps trying to reconcile the ALB with an non-existent WAF without alb.ingress.kubernetes.io/wafv2-acl-arn annotation on ingress resource

Describe the bug

AWS Load Balancer Controller keeps trying to reconcile waf-acl association to the provisioned ALB with the ingress, even though the WAFv2 and WAFv2 Web-Acls have been deleted and annotation alb.ingress.kubernetes.io/wafv2-acl-arn removed from the Ingress Resource.

Steps to reproduce

I am a CSE from AWS Support, I will explain the problem customer was having and steps I am trying to reproduce the issue but with no luck so far,

Customer explained that WAFv2 and WAFv2 Web-Acls have been deleted from their AWS Accounts weeks ago, and previously ingress resources for their EKS Cluster have the following annotation:

# alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:us-east-1:XXXXXXX:regional/webacl/xxx-waf-xxxx/xxxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxx

After commenting out the annotation for their ingress resources, customer noticed the following error on ingress, which was continuously being faced:


   Events:
  Type     Reason             Age                     From     Message
  ----     ------             ----                    ----     -------
  Warning  FailedDeployModel  7m39s (x665 over 6d9h)  ingress  Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.

During our call we were not able to find the root cause, but we are able to find a mitigation for this error which was to upgrade AWS Load Balancer Controller to the latest version available: 2.6.2. Since version v2.5.3, AWS Load Balancer controller supports disabling the reconciliation process for waf-acl association to the provisioned ALB with the ingress. Once disabled, the controller shall not take any actions on the waf addons of the provisioned ALBs.

We performed the following steps to upgrade the AWS Load Balancer Controller to the latest version and add the flags to disabling the reconciliation process for waf-acl association:

Step 1: Update AWSLoadBalancerControllerIAMPolicy to the latest version available: https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.6.2/docs/install/iam_policy.json Step 2: Upgrade AWS Load Balancer Controller using Helm Upgrade Command:

helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller \
 -n kube-system \
 --set clusterName=my-cluster \
 --set serviceAccount.create=false \
 --set serviceAccount.name=aws-load-balancer-controller
 --set region=region-code
 --set vpcId=vpc-xxxxxxxx

Step 3: Add the flags below on AWS Load Balancer Controler Deployment spec.template.spec.containers.args

  - --enable-waf=false
  - --enable-wafv2=false

Once the flags have been added, we confirmed Ingress Resources showed Reconcilition succeed.

ingress Successfully reconciled However, we tested removing the flags and the problem started again:

ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist. Even though there is no more annotation on the ingress for alb.ingress.kubernetes.io/wafv2-acl-arn

Steps I tried to reproduce the issue:

Step 1: Install AWS load balancer Controller on version 2.6.2

Step 2: Install minimal ingress with the annotation alb.ingress.kubernetes.io/wafv2-acl-arn for an non-existent WAF

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  namespace: fargate
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:ap-southeast-2:XXXXXXXXXX:regional/webacl/fake-waf/XXXXX-XXXXX-XXX-XXXX-XXXXX
spec:
  ingressClassName: alb
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test
            port:
              number: 80

And I could see the error:

Warning FailedDeployModel 15s ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist. Step 3: Then I removed the annotation from the ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  namespace: fargate
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  ingressClassName: alb
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test
            port:
              number: 80

And I could see the reconcilitation succeed

Normal SuccessfullyReconciled 3s (x2 over 6m21s) ingress Successfully reconciled

Customer has ingress running on fargate, I tried the above in either a fargate namespace and default namespace and got the same results. This is not specific to a particular EKS Cluster or AWS Environment, given the customer is facing this problem across their dev, stage and prod eks Clusters which runs across different AWS Accounts.

I would appreciate if anyone from AWS Load Balancer Controller team can provide any recommendation about why on the customer environment AWS Load Balancer Controller keeps trying to reconcile the ALB with an non-existent WAF and also without alb.ingress.kubernetes.io/wafv2-acl-arn annotation on ingress resource?

Expected outcome Given there is no annotation alb.ingress.kubernetes.io/wafv2-acl-arn on ingress resource anymore, we would expected that without using the below flags on AWS Load Balancer Controler Deployment spec.template.spec.containers.args

--enable-waf=false
--enable-wafv2=false

The reconciliation succeed.

Environment

AWS Load Balancer controller version
2.6.2
Kubernetes version
Using EKS (yes/no), if so version? 1.28 Additional Context: I am a CSE from AWS Support, please reach out to me internally if you want to discuss anything privately.

Nov 14 '23 02:11 maiconrocha

@maiconrocha, would you be able to share the logs when cx hit this issue in the older version? You can send out to me internally if not able to share in the gh. Thanks.

Nov 14 '23 21:11 oliviassss

Thanks @oliviassss I don't have the full logs, I asked the customer to monitor this thread so they can provide any further logs if required. Are you interested in AWS Load Balancer Controller logs or something else? please note, customer is also facing the issue on the latest version 2.6.2 when we removed the flags

--enable-waf=false
--enable-wafv2=false and the problem started again:

ingress Failed deploy model due to failed to create WAFv2 webACL association on LoadBalancer: WAFNonexistentItemException: AWS WAF couldn’t perform the operation because your resource doesn’t exist.

Once the flags have been re-added, we confirmed Ingress Resources showed Reconcilition succeed.

ingress Successfully reconciled

Nov 15 '23 04:11 maiconrocha

@maiconrocha, I'll try to reproduce, yes the controller logs will also help us to further investigate. Thanks

Nov 15 '23 19:11 oliviassss

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 13 '24 19:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Mar 29 '24 23:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Apr 29 '24 00:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 29 '24 00:04 k8s-ci-robot

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

AWS Load Balancer Controller keeps trying to reconcile the ALB with an non-existent WAF without alb.ingress.kubernetes.io/wafv2-acl-arn annotation on ingress resource

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard