aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

keepTLSSecret value not being honoured

Open lcaproni-pp opened this issue 3 years ago • 49 comments

Describe the bug

There was an update here https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/2264 - to allow users of the helm chart to keep the TLS secrets as they are and to stop non-empty diffs running every plan/apply. However upon setting the value the chart still generates diffs for the TLS certs.

Steps to reproduce

1 - Install helm chart & set the keepTLSSecret value to true 2 - Run a helm diff after installation 3 - Observe output and notice that diffs are still being generated

Expected outcome The webhook secret that already contains the TLS certs should be re-used when using the keepTLSSecret value within the helm chart.

It could be that i'm missing some configuration or something but there is nothing that seems obvious to me.

Environment

  • AWS Load Balancer controller version - v2.3.0 -Helm chart v1.3.1
  • Kubernetes version - 1.21
  • Using EKS (yes/no), if so version? Yes - 1.21

lcaproni-pp avatar Oct 21 '21 12:10 lcaproni-pp

@lcaproni-pp Can you try to use helm upgrade after setting the keepTLSSecret value to true? And then check if the key and cert in aws-load-balancer-tls are unchanged.

oliviassss avatar Oct 22 '21 18:10 oliviassss

@lcaproni-pp we use lookup function to query the current secret, however, according to the doc:

Helm is not supposed to contact the Kubernetes API Server during a helm template or a helm install|update|delete|rollback --dry-run, so the lookup function will return an empty list (i.e. dict) in such a case.

And helm diff is actually using helm upgrade --debug --dry-run (ref link), that's why lookup doesn't work in helm diff either, so the TLS secret changed in the output. Please try to use helm upgrade directly as a workaround, and let us know if you have further questions.

oliviassss avatar Oct 22 '21 22:10 oliviassss

We use https://github.com/hashicorp/terraform-provider-helm to install charts into clusters and face the same issue. This is the output of terraform apply command:

  ~ resource "helm_release" "aws_loadbalancer_controller" {
        id                         = "aws-load-balancer-controller"
      ~ manifest                   = jsonencode(
          ~ {
              ~ ing/secret/v1/aws-load-balancer-tls                                                                                             = {
                  ~ data       = {
                      ~ ca.crt  = "KHNlbnNpdGl2ZSB2YWx1ZSBiNjg3ZTg4OTM5ZDQwYWFjKQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSAzZjA3M2I1MDY3MGJiZTkxKQ=="
                      ~ tls.crt = "KHNlbnNpdGl2ZSB2YWx1ZSBlODUzODMyOWQ5OTkwZTA0KQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSBiMDcyY2E5NDYzN2NmMDkzKQ=="
                      ~ tls.key = "KHNlbnNpdGl2ZSB2YWx1ZSA0MWQ4MjFkZDAyMjllYjg0KQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSBiMzA4NDFmMWZlNjYzMThiKQ=="
                    }
                    # (4 unchanged elements hidden)
                }

Each time when we run terraform apply certs are re-generated again.

Environment terraform helm provider version: 2.3.0 (https://registry.terraform.io/providers/hashicorp/helm/latest/docs#experiments is enabled) AWS Load Balancer controller version - v2.3.0 -Helm chart v1.3.1 Kubernetes version - 1.21 Using EKS (yes/no), if so version? Yes - 1.21 keepTLSSecret is set to true

mglotov avatar Oct 25 '21 05:10 mglotov

@oliviassss Thanks for this, I tried doing

helm upgrade --debug --dry-run aws-load-balancer-controller eks/aws-load-balancer-controller -f values.yaml

I had to manually compare the certs with the ones that are already stored on the cluster and they didn't match up at least from what I seen so my deduction was that they would have been changing from what was already stored on the cluster. I will say also that our pipelines are doing a helm install/apply || helmfile apply when the PR is merged and checking the pipeline logs the chart was trying to update the values.

Values file contents just for reference:

clusterName: CLUSTERNAME

autoDiscoverAwsVpcID: true

awsRegion: eu-west-1

keepTLSSecret: true

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: XXXXXXXX

lcaproni-pp avatar Oct 25 '21 08:10 lcaproni-pp

Can confirm this does not work when running helmfile apply.

ypicard avatar Oct 26 '21 04:10 ypicard

not working with Pulumi helm module either

tamirhad avatar Oct 26 '21 15:10 tamirhad

With the recent chart version v1.3.x, we've also added support for directly specifying the TLS certificates/keys during chart install. Here is the snippet from the values.yaml:

# webhookTLS specifies TLS cert/key for the webhook
webhookTLS:
  caCert:
  cert:
  key:

This method doesn't depend on querying the cluster for existing secret, and provide consistent results with the --dry-run or helm template without --validate. Please see if this option is useful in this situation. Note: you need to specify all three attributes - caCert, cert and the key.

kishorj avatar Oct 27 '21 00:10 kishorj

Is this the required way to do things, or is it actually bug to still view a diff when keepTLSSecret is set to true?

ypicard avatar Oct 27 '21 04:10 ypicard

keepTLSSecret option doesn't either work in 2.3 release with newest helm chart - we use argocd for deployments.

jdomag avatar Oct 27 '21 07:10 jdomag

Siding with @ypicard on this one here, if this is the recommended way to go then the keeptls feature should be removed and docs should be updated so users know to do this. I know it's a feature but personally I didn't want to go through the hassle of doing this X times and then every X time I deploy this helm chart into a new cluster. If that's how it has to be then fair enough but since it wasn't previously I hoped it would stay the same.

lcaproni-pp avatar Oct 27 '21 07:10 lcaproni-pp

@jdomag I haven't tried it yet (I use helmfile not ArgoCD for deploying this project, nor have tried the 2.3 chart myself yet) but there supposedly is an ArgoCD workaround. See this comment to let ArgoCD ignore changes. Please let us know if this still works or the config needs updating for the latest helm chart.

Feedback on 2.3 - using lookup in the helm templating will not work in many non-vanilla helm scenarios & is probably the cause for the continuing problems (this issue) - please reimplement the fix.

tyrken avatar Oct 27 '21 08:10 tyrken

@tyrken Yep I saw that, however adding ignore to ArgoCD is just a workaround, not a problem solution. Thanks for a hint though!

jdomag avatar Oct 27 '21 11:10 jdomag

Hey @oliviassss @kishorj is there any update on this at all? Is the offical recommended way to do this generate our own secrets and store them inside the cluster or is there a bug that needs addressing? Happy to help out in some way if possible :)

lcaproni-pp avatar Nov 30 '21 16:11 lcaproni-pp

@kishorj or anyone: !!urgent help!! is there any workaround for something similar to keepTLSSecret=true? as this configuration is not working with

app.kubernetes.io/version=v2.3.1 helm.sh/chart=aws-load-balancer-controller-1.3.3

If I have to assign the initially created TLS values to the webhookTLS config params, is there any way?

gitmaniak avatar Jan 30 '22 20:01 gitmaniak

@kishorj or anyone: !!urgent help!! is there any workaround for something similar to keepTLSSecret=true? as this configuration is not working with

app.kubernetes.io/version=v2.3.1 helm.sh/chart=aws-load-balancer-controller-1.3.3

If I have to assign the initially created TLS values to the webhookTLS config params, is there any way?

we are using cert-manager and not facing this issue anymore.

tamirhad avatar Jan 30 '22 20:01 tamirhad

sorry, I am new to the kubernetes world, so @tamirhad if we enable cert-manager, are the certs auto generated and maintained? or is it app teams responsibility to load it somewhere (secrets or aws acm) and reference it? Also, is there any option to not use TLS? our plan is to terminate TLS at the load balancer level.

gitmaniak avatar Jan 30 '22 20:01 gitmaniak

sorry, I am new to the kubernetes world, so @tamirhad if we enable cert-manager, are the certs auto generated and maintained? or is it app teams responsibility to load it somewhere (secrets or aws acm) and reference it? Also, is there any option to not use TLS? our plan is to terminate TLS at the load balancer level.

The cert manager will create the certificates for you. It's not the external certificates you are using on the LB(these are configured via annotations). You will need to install the cert-manager before installing aws-lb-ctrlr in order to utilize this feature.

tamirhad avatar Jan 30 '22 22:01 tamirhad

@tamirhad thank you, that did the trick! I have to add this config "enableCertManager": true on the aws load balancer controller

gitmaniak avatar Jan 31 '22 02:01 gitmaniak

This is a workaround correct? We should still expect this to work at some point?

ypicard avatar Feb 01 '22 17:02 ypicard

using a cert-manager is not a workaround. we let the cert-manager to handle the self signed certificates.

tamirhad avatar Feb 01 '22 17:02 tamirhad

I'm siding with the workaround theory, I don't want to have to enable cert-manager as it's more things to think about also this used to just work. It was a recent upgrade at some point that made it break, I know this because we have an older cluster with a old version of this controller installed and it doesn't hit these issues.

lcaproni-pp avatar Feb 01 '22 19:02 lcaproni-pp

I'm siding with the workaround theory, I don't want to have to enable cert-manager as it's more things to think about also this used to just work. It was a recent upgrade at some point that broke it, I know this because We have an older cluster with a old version of this controller installed and it doesn't hit these issues.

You are right it's should be fixed in any case, we are using cert-manager anyway so it's not quite a workaround for us. If people around here are using cert-manager I don't see why not enabling it here as well.

tamirhad avatar Feb 02 '22 13:02 tamirhad

Good afternoon,

We are running aws-load-balancer-controller (chart version: 1.3.3 / app version v2.3.1) through Argo CD (version 2.1.7) also and run into this.

As for what we have understood if we deploy cert-manager and enable it on aws-load-balancer-controller chart enableCertManager: true, then we don't need to enable keepTLSSecret, do we?

We have done it but now we are getting booth, the ValidatingWebhookConfiguration and the MutatingWebhookConfiguration, re-generated in loop, seems like every second it generates a new one, I attach logs from the cert-manager ca injector:

I0202 17:59:53.727615 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.741527 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.761636 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.849133 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.618380 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.648908 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.649341 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.664550 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.842997 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.851424 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.906034 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.958196 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.391696 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.460421 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.549418 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.551209 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1"

We think this is because Argo is trying to auto-sync all the time, are we missing something? if we are using Argo CD we still have to do the following "workaround"?

Thank you so much in advance 🙇

srosell-ut avatar Feb 02 '22 18:02 srosell-ut

We're also hitting this but oddly only in one cluster. We're using Terraform's helm provider to manage the chart. No amount of nuke, create or updates allows this to converge. I've tried keepTLSSecret=true to no avail. We don't use cert-manager so that's a none option for us.

The only thing that's "worked" is the heavy hammer to ignore drift in Terraform 🤮:

resource "helm_release" "aws-load-balancer-controller" {
  name       = "aws-load-balancer-controller"
  repository = "https://aws.github.io/eks-charts"
  chart      = "aws-load-balancer-controller"
  version    = "1.3.3"

  namespace = "kube-system"

  values = [
    yamlencode(
      {
        serviceAccount = {
          annotations = {
            "eks.amazonaws.com/role-arn" = "${var.aws_load_balancer_controller_role_arn}"
          }
        }
      }
    )
  ]

  set {
    name  = "clusterName"
    value = var.name
  }

  set {
    name  = "replicaCount"
    value = 1
  }

  set {
    name  = "keepTLSSecret"
    value = true
  }

  lifecycle {
    ignore_changes = [
      manifest
    ]
  }
}

Is there any way to get things to converge or to ignore TLS changes?

As an aside, this issue only impacts Terraform if manifest rendering is enabled for drift detection in the provider:

provider "helm" {
  kubernetes {
    ...
  }

  experiments {
    // Render Helm manifest so we get full diffs
    manifest = true
  }
}

When manifest are off, things immediately converge as the underlying diffs are not presented to Terraform. We ended up disabling manifest and were able to workaround this issue.

nigelellis avatar Feb 08 '22 17:02 nigelellis

My current workaround is to read the values from the aws-load-balancer-tls and specify those for the webhookTLS.* values.

data "kubernetes_secret" "alb-tls" {
  metadata {
    name      = "aws-load-balancer-tls"
    namespace = "kube-system"
  }
  binary_data = {
    "ca.crt"  = ""
    "tls.crt" = ""
    "tls.key" = ""
  }
}

resource "helm_release" "aws-alb-ingress-controller" {
  name       = "aws-load-balancer-controller"
  namespace  = "kube-system"
  chart      = "aws-load-balancer-controller"
  repository = "https://aws.github.io/eks-charts"
  version    = "1.4.0"

  set {
    name  = "webhookTLS.caCert"
    value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "ca.crt"))
  }

  set {
    name  = "webhookTLS.cert"
    value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "tls.crt"))
  }

  set {
    name  = "webhookTLS.key"
    value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "tls.key"))
  }
  ...

Of course, this only works once the chart has been installed, since the secret needs to exist already. A slightly better workaround would be to also generate the TLS secrets with terraform.

skoob avatar Mar 01 '22 07:03 skoob

@tamirhad thank you, that did the trick! I have to add this config "enableCertManager": true on the aws load balancer controller

@gitmaniak @tamirhad when you set enableCertManager: true how do you configure keepTLSSecret ? true or false ?

ajmal-bash avatar Mar 10 '22 15:03 ajmal-bash

when you set enableCertManager: true how do you configure keepTLSSecret ? true or false ?

@ajmal-bash when using enableCertManager: true , the keepTLSSecret value is being ignored by the webhook template, see e.g. https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/c1b95793b0fc3902caf795ac5b5e77fd676aab81/helm/aws-load-balancer-controller/templates/webhook.yaml#L15

philicious avatar Apr 19 '22 16:04 philicious

The workaround to use cert-manager isn't working 100% with ArgoCD as it's showing the two webhooks permanently out of sync.

The problem I think is that the Helm chart is templating in the "empty" Cg== CA bundle, and because the webhooks have the cert-manager annotations added, cert-manager's own mutating webhooks then replace the CA bundle fields with the correct values. ArgoCD then shows a diff as the original copy of the manifest has Cg== which doesn't match the live version.

I think the fix here is to not template any CA bundle in at all if using cert-manager, i.e.:

{{ if not $.Values.enableCertManager }}
caBundle: {{ $tls.caCert }}
{{ end }}

That way ArgoCD shouldn't care when the live version of the manifest has that field, based on another in-house webhook which also uses cert-manager but doesn't show this permanent diff.

bodgit avatar Jun 17 '22 13:06 bodgit

@bodgit We have the same issue when using the helm chart through tanka and I came to the same conclusion. Have an open PR to add exactly what you are suggesting, https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/2649.

mikael-lindstrom avatar Jun 17 '22 13:06 mikael-lindstrom

I'll delete my fork of the repo and keep an eye on your PR then :smile:

bodgit avatar Jun 17 '22 13:06 bodgit