aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
keepTLSSecret value not being honoured
Describe the bug
There was an update here https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/2264 - to allow users of the helm chart to keep the TLS secrets as they are and to stop non-empty diffs running every plan/apply. However upon setting the value the chart still generates diffs for the TLS certs.
Steps to reproduce
1 - Install helm chart & set the keepTLSSecret value to true
2 - Run a helm diff after installation
3 - Observe output and notice that diffs are still being generated
Expected outcome The webhook secret that already contains the TLS certs should be re-used when using the keepTLSSecret value within the helm chart.
It could be that i'm missing some configuration or something but there is nothing that seems obvious to me.
Environment
- AWS Load Balancer controller version - v2.3.0 -Helm chart v1.3.1
- Kubernetes version - 1.21
- Using EKS (yes/no), if so version? Yes - 1.21
@lcaproni-pp Can you try to use helm upgrade
after setting the keepTLSSecret value to true
? And then check if the key and cert in aws-load-balancer-tls are unchanged.
@lcaproni-pp we use lookup
function to query the current secret, however, according to the doc:
Helm is not supposed to contact the Kubernetes API Server during a helm template or a helm install|update|delete|rollback --dry-run, so the lookup function will return an empty list (i.e. dict) in such a case.
And helm diff
is actually using helm upgrade --debug --dry-run
(ref link), that's why lookup
doesn't work in helm diff
either, so the TLS secret changed in the output.
Please try to use helm upgrade
directly as a workaround, and let us know if you have further questions.
We use https://github.com/hashicorp/terraform-provider-helm to install charts into clusters and face the same issue.
This is the output of terraform apply
command:
~ resource "helm_release" "aws_loadbalancer_controller" {
id = "aws-load-balancer-controller"
~ manifest = jsonencode(
~ {
~ ing/secret/v1/aws-load-balancer-tls = {
~ data = {
~ ca.crt = "KHNlbnNpdGl2ZSB2YWx1ZSBiNjg3ZTg4OTM5ZDQwYWFjKQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSAzZjA3M2I1MDY3MGJiZTkxKQ=="
~ tls.crt = "KHNlbnNpdGl2ZSB2YWx1ZSBlODUzODMyOWQ5OTkwZTA0KQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSBiMDcyY2E5NDYzN2NmMDkzKQ=="
~ tls.key = "KHNlbnNpdGl2ZSB2YWx1ZSA0MWQ4MjFkZDAyMjllYjg0KQ==" -> "KHNlbnNpdGl2ZSB2YWx1ZSBiMzA4NDFmMWZlNjYzMThiKQ=="
}
# (4 unchanged elements hidden)
}
Each time when we run terraform apply
certs are re-generated again.
Environment
terraform helm provider version: 2.3.0 (https://registry.terraform.io/providers/hashicorp/helm/latest/docs#experiments is enabled)
AWS Load Balancer controller version - v2.3.0 -Helm chart v1.3.1
Kubernetes version - 1.21
Using EKS (yes/no), if so version? Yes - 1.21
keepTLSSecret is set to true
@oliviassss Thanks for this, I tried doing
helm upgrade --debug --dry-run aws-load-balancer-controller eks/aws-load-balancer-controller -f values.yaml
I had to manually compare the certs with the ones that are already stored on the cluster and they didn't match up at least from what I seen so my deduction was that they would have been changing from what was already stored on the cluster. I will say also that our pipelines are doing a helm install/apply || helmfile apply
when the PR is merged and checking the pipeline logs the chart was trying to update the values.
Values file contents just for reference:
clusterName: CLUSTERNAME
autoDiscoverAwsVpcID: true
awsRegion: eu-west-1
keepTLSSecret: true
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: XXXXXXXX
Can confirm this does not work when running helmfile apply
.
not working with Pulumi helm module either
With the recent chart version v1.3.x, we've also added support for directly specifying the TLS certificates/keys during chart install. Here is the snippet from the values.yaml:
# webhookTLS specifies TLS cert/key for the webhook
webhookTLS:
caCert:
cert:
key:
This method doesn't depend on querying the cluster for existing secret, and provide consistent results with the --dry-run
or helm template
without --validate
. Please see if this option is useful in this situation. Note: you need to specify all three attributes - caCert
, cert
and the key
.
Is this the required way to do things, or is it actually bug to still view a diff when keepTLSSecret
is set to true
?
keepTLSSecret option doesn't either work in 2.3 release with newest helm chart - we use argocd for deployments.
Siding with @ypicard on this one here, if this is the recommended way to go then the keeptls feature should be removed and docs should be updated so users know to do this. I know it's a feature but personally I didn't want to go through the hassle of doing this X times and then every X time I deploy this helm chart into a new cluster. If that's how it has to be then fair enough but since it wasn't previously I hoped it would stay the same.
@jdomag I haven't tried it yet (I use helmfile not ArgoCD for deploying this project, nor have tried the 2.3 chart myself yet) but there supposedly is an ArgoCD workaround. See this comment to let ArgoCD ignore changes. Please let us know if this still works or the config needs updating for the latest helm chart.
Feedback on 2.3 - using lookup
in the helm templating will not work in many non-vanilla helm scenarios & is probably the cause for the continuing problems (this issue) - please reimplement the fix.
@tyrken Yep I saw that, however adding ignore to ArgoCD is just a workaround, not a problem solution. Thanks for a hint though!
Hey @oliviassss @kishorj is there any update on this at all? Is the offical recommended way to do this generate our own secrets and store them inside the cluster or is there a bug that needs addressing? Happy to help out in some way if possible :)
@kishorj or anyone: !!urgent help!! is there any workaround for something similar to keepTLSSecret=true? as this configuration is not working with
app.kubernetes.io/version=v2.3.1 helm.sh/chart=aws-load-balancer-controller-1.3.3
If I have to assign the initially created TLS values to the webhookTLS config params, is there any way?
@kishorj or anyone: !!urgent help!! is there any workaround for something similar to keepTLSSecret=true? as this configuration is not working with
app.kubernetes.io/version=v2.3.1 helm.sh/chart=aws-load-balancer-controller-1.3.3
If I have to assign the initially created TLS values to the webhookTLS config params, is there any way?
we are using cert-manager and not facing this issue anymore.
sorry, I am new to the kubernetes world, so @tamirhad if we enable cert-manager, are the certs auto generated and maintained? or is it app teams responsibility to load it somewhere (secrets or aws acm) and reference it? Also, is there any option to not use TLS? our plan is to terminate TLS at the load balancer level.
sorry, I am new to the kubernetes world, so @tamirhad if we enable cert-manager, are the certs auto generated and maintained? or is it app teams responsibility to load it somewhere (secrets or aws acm) and reference it? Also, is there any option to not use TLS? our plan is to terminate TLS at the load balancer level.
The cert manager will create the certificates for you. It's not the external certificates you are using on the LB(these are configured via annotations). You will need to install the cert-manager before installing aws-lb-ctrlr in order to utilize this feature.
@tamirhad thank you, that did the trick! I have to add this config "enableCertManager": true on the aws load balancer controller
This is a workaround correct? We should still expect this to work at some point?
using a cert-manager is not a workaround. we let the cert-manager to handle the self signed certificates.
I'm siding with the workaround theory, I don't want to have to enable cert-manager as it's more things to think about also this used to just work. It was a recent upgrade at some point that made it break, I know this because we have an older cluster with a old version of this controller installed and it doesn't hit these issues.
I'm siding with the workaround theory, I don't want to have to enable cert-manager as it's more things to think about also this used to just work. It was a recent upgrade at some point that broke it, I know this because We have an older cluster with a old version of this controller installed and it doesn't hit these issues.
You are right it's should be fixed in any case, we are using cert-manager anyway so it's not quite a workaround for us. If people around here are using cert-manager I don't see why not enabling it here as well.
Good afternoon,
We are running aws-load-balancer-controller (chart version: 1.3.3 / app version v2.3.1) through Argo CD (version 2.1.7) also and run into this.
As for what we have understood if we deploy cert-manager and enable it on aws-load-balancer-controller chart enableCertManager: true
, then we don't need to enable keepTLSSecret
, do we?
We have done it but now we are getting booth, the ValidatingWebhookConfiguration and the MutatingWebhookConfiguration, re-generated in loop, seems like every second it generates a new one, I attach logs from the cert-manager ca injector:
I0202 17:59:53.727615 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.741527 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.761636 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:53.849133 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.618380 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.648908 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.649341 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 17:59:58.664550 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.842997 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.851424 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.906034 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:04.958196 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.391696 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.460421 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.549418 1 controller.go:178] cert-manager/certificate/validatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="ValidatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1" I0202 18:00:09.551209 1 controller.go:178] cert-manager/certificate/mutatingwebhookconfiguration/generic-inject-reconciler "msg"="updated object" "resource_kind"="MutatingWebhookConfiguration" "resource_name"="aws-load-balancer-webhook" "resource_namespace"="" "resource_version"="v1"
We think this is because Argo is trying to auto-sync all the time, are we missing something? if we are using Argo CD we still have to do the following "workaround"?
Thank you so much in advance 🙇
We're also hitting this but oddly only in one cluster. We're using Terraform's helm
provider to manage the chart. No amount of nuke, create or updates allows this to converge. I've tried keepTLSSecret=true
to no avail. We don't use cert-manager so that's a none option for us.
The only thing that's "worked" is the heavy hammer to ignore drift in Terraform 🤮:
resource "helm_release" "aws-load-balancer-controller" {
name = "aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
chart = "aws-load-balancer-controller"
version = "1.3.3"
namespace = "kube-system"
values = [
yamlencode(
{
serviceAccount = {
annotations = {
"eks.amazonaws.com/role-arn" = "${var.aws_load_balancer_controller_role_arn}"
}
}
}
)
]
set {
name = "clusterName"
value = var.name
}
set {
name = "replicaCount"
value = 1
}
set {
name = "keepTLSSecret"
value = true
}
lifecycle {
ignore_changes = [
manifest
]
}
}
Is there any way to get things to converge or to ignore TLS changes?
As an aside, this issue only impacts Terraform if manifest rendering is enabled for drift detection in the provider:
provider "helm" {
kubernetes {
...
}
experiments {
// Render Helm manifest so we get full diffs
manifest = true
}
}
When manifest are off, things immediately converge as the underlying diffs are not presented to Terraform. We ended up disabling manifest and were able to workaround this issue.
My current workaround is to read the values from the aws-load-balancer-tls
and specify those for the webhookTLS.*
values.
data "kubernetes_secret" "alb-tls" {
metadata {
name = "aws-load-balancer-tls"
namespace = "kube-system"
}
binary_data = {
"ca.crt" = ""
"tls.crt" = ""
"tls.key" = ""
}
}
resource "helm_release" "aws-alb-ingress-controller" {
name = "aws-load-balancer-controller"
namespace = "kube-system"
chart = "aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
version = "1.4.0"
set {
name = "webhookTLS.caCert"
value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "ca.crt"))
}
set {
name = "webhookTLS.cert"
value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "tls.crt"))
}
set {
name = "webhookTLS.key"
value = base64decode(lookup(data.kubernetes_secret.alb-tls.binary_data, "tls.key"))
}
...
Of course, this only works once the chart has been installed, since the secret needs to exist already. A slightly better workaround would be to also generate the TLS secrets with terraform.
@tamirhad thank you, that did the trick! I have to add this config "enableCertManager": true on the aws load balancer controller
@gitmaniak @tamirhad when you set enableCertManager: true
how do you configure keepTLSSecret
? true
or false
?
when you set
enableCertManager: true
how do you configurekeepTLSSecret
?true
orfalse
?
@ajmal-bash when using enableCertManager: true
, the keepTLSSecret
value is being ignored by the webhook template, see e.g. https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/c1b95793b0fc3902caf795ac5b5e77fd676aab81/helm/aws-load-balancer-controller/templates/webhook.yaml#L15
The workaround to use cert-manager isn't working 100% with ArgoCD as it's showing the two webhooks permanently out of sync.
The problem I think is that the Helm chart is templating in the "empty" Cg==
CA bundle, and because the webhooks have the cert-manager annotations added, cert-manager's own mutating webhooks then replace the CA bundle fields with the correct values. ArgoCD then shows a diff as the original copy of the manifest has Cg==
which doesn't match the live version.
I think the fix here is to not template any CA bundle in at all if using cert-manager, i.e.:
{{ if not $.Values.enableCertManager }}
caBundle: {{ $tls.caCert }}
{{ end }}
That way ArgoCD shouldn't care when the live version of the manifest has that field, based on another in-house webhook which also uses cert-manager but doesn't show this permanent diff.
@bodgit We have the same issue when using the helm chart through tanka and I came to the same conclusion. Have an open PR to add exactly what you are suggesting, https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/2649.
I'll delete my fork of the repo and keep an eye on your PR then :smile: