aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
[v2] Error when there are multiple certificates for a host
It is perfectly fine for there two be two certificates available for a particular host. I have this right now because I am in mid-migration and have both a wildcard, and host-specific, cert. In these cases, the controller should either add both certificates, or add (in my opinion) the least specific certificate.
{"level":"error","ts":1604259893.3354607,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"dex","namespace":"kube-system","error":"multiple certificate found for host: dex.admin.rex.sh, certARNs: [arn:aws:acm:us-west-2:355508092300:certificate/c6d7d07b-0aec-49ae-aff6-3616afb4b0a7 arn:aws:acm:us-west-2:355508092300:certificate/eeba7334-1671-4ad9-bc2f-d458ef972fdb]"}
@postmaxin
That's why we allow to specify the certificate-arn
annotation to override the cert-discovery feature. The controller cannot make a choice which certificate to use when there are multiple matches. 😄
It can, it just does not. It has no criteria. Example criteria would be: Length of domain name, expiry date, etc. If it makes a choice based on some criteria, it may not be the exact criteria everyone who uses AWSv2 would choose. However, it would be a choice, that is predictable, that breaks things less than having AWSv2 give up.
I would suggest that in this situation, the controller should pick a certificate in a deterministic way, but log a warning saying that there were other choices. That's better than it just throwing it's hands up and walking away.
@postmaxin I agree we can do it if we have a deterministic way. (e.g. pure sort based on lexical order of certificate arn). will it suits your need?
It is behaviour that changed between v1 and v2 I think, so I'd say it should do whatever v1 did to keep the 'backwards-compatible' promise :)
Scenario that I have ingress rules for "example.com", "*.example.com", "sub.example.com" and "*.sub.example.com". With two certificates, "*.example.com" with additional domains "example.com" and "*.sub.example.com" with additional domain "sub.example.com". So there is some overlap in the domain names. But this worked in v1, v2 complains that the sub.example.com ingress now matches two certificates, which is correct. But v1 seemed to pick one (the 'exact match')...?
@jorrit-wehelp it's same behavior with v1: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/master/internal/alb/ls/cert_discovery.go#L54 (if multiple certificate found, it will error out)
ALB supports multiple certificates, so why doesn't the controller just add them all?
I see the same issue, despite having the alb.ingress.kubernetes.io/certificate-arn
annotation configured.
The "old" certificate was used by the controller, after adding the new certificate and changing the configuration, it gets stuck.
After thinking hard, I noticed this: I have two ingress objects sharing an ingress group. As alb.ingress.kubernetes.io/certificate-arn is listed as "merge", I annotated only one Ingress in the group, assuming it would be used for all ingresses in the group. But the controller logged this message for the other ingress in the same group without the annotation.
So apparently the "determine-the-right-certificate" code runs for each ingress object individually, not for the merged annotations of an ingress group?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
This is an issue especially when you are using wildcard subdomains. You might want *.test.company.com but *.company.com already exists. There's no reason you can't add both.
So, for an org using wildcard subdomains, we have to manually specify the ARN for every ingress we provision?
edit: I found a workaround. The wildcard subdomain had a SAN that conflicted with the wildcard domain.
*.test.company.com with a SAN of test.company.com -- conflicts with *.company.com
Recreated the ACM certificate of *.test.company.com with no SAN, and the ingress controller added both *.test.company.com and *.company.com. The routes used in this case are *.test.company.com and test.company.com.
Still, I think it would be better if there was some deterministic logic and any choice made, instead of throwing the "could not auto-load certificates from ACM: multiple certificate found for host" error and giving up.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
This was a frustrating bug/behavior to find at the last minute when trying to go-live.
At the very least, could the AWS LBC not attempt auto-discovery beyond the list you give with the certificate-arn
annotation? That would limit the opportunity for confusion. Not that it should matter.
@M00nF1sh , There is any way to disable the Certificate Discovery?
I can't see any way to disable it on Docs: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/cert_discovery/
I really think that if I could manage all my ALB configuration level (such as certificates
, security groups
, ssl policy
, scheme
and etc..) in 1 place like a global ingress
and all other target group configuration level on my other ingress
resources it would be great.
I really think that if I could manage all my ALB configuration level (such as
certificates
,security groups
,ssl policy
,scheme
and etc..) in 1 place like a globalingress
and all other target group configuration level on my otheringress
resources it would be great.
Off topic but @dsaydon90 - Look into the alb.ingress.kubernetes.io/group.name
annotation. What we have done for this at my organization is to have a helm deployment of dummy "model" ingresses that have the global stuff (certs, SG's etc) along with the alb.ingress.kubernetes.io/group.name
annotation. Then simply create ingresses in the other apps that use that annotation. AWS-LBC will merge the new TG's into the "model" ingresses with the matching group.name
. This way we don't have to track all the annotations for every chart using the LB.
Hi @matthewmrichter , Thank you for providing your solution.
Actually, This is what I currently have, and I was set my certificates in 1 ingress
resource, but my problem occurred once I'm using additional ingress
resource which contains the annotation alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]
, the alb-ingress
will detect this annotation and will try to auto discover the right certificate for it.
What I'm looking for is to disable the auto discovery for certificate so I will have full control on my ALB configuration.
Is this bug still happening in v2.3?
@dsaydon90 To disable it, you have to manually specify at least one TLS certificate per Ingress.
@matthewmrichter There isn't any changes made to this logic yet. We have been planning to relax to restriction of discover one certificate per Ingress, so that you should be able to specify certificates in a centralized Ingress.
What I'm looking for is to disable the auto discovery for certificate so I will have full control on my ALB configuration.
@M00nF1sh this doesn't seem like the same behavior as V1.
We have ingresses which match two certificates - *.example.com
and foo.example.com
. These ingresses were created with V1 and the controller added the *.example.com
cert via the discovery feature.
Now, with V2 we get the error regarding the multiple certificates (for the existing ingresses), so this does seem like there were breaking changes somewhere.
As a side note, why can't the controller just load all of the certificates it finds?
@dudicoco I am in the same situation, the other cert is a wildcard. Why can't it do exact match?
@postmaxin I agree we can do it if we have a deterministic way. (e.g. pure sort based on lexical order of certificate arn). will it suits your need?
AWS has an algorithm for choosing the best cert if multiple are provided. I see no reason for us to do the same login on our part.
Their algorithm*
- Public key algorithm (prefer ECDSA over RSA)
- Hashing algorithm (prefer SHA over MD5)
- Key length (prefer the largest)
- Validity period
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/remove-lifecycle stale
We're dealing with this also. There's no reason it can't just add all the certs that match, but we need to completely toss out the cert discovery functionality to use this controller it seems.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
It could be an extra option that enables such a feature, so that compatibility is not broken. Any change to have it?
/remove-lifecycle stale
/remove-lifecycle stale
Seems like the most sensible and straightforward deterministic way is to select the most recent valid cert arn, or provide an annotation to do the same.
I have the use case of rotating my certificate in a multi step process where I
- Create a new cert
- Update all associations to the old cert to use the new one
- Delete the old cert
It would be convenient for me to have the controller pick the latest certificate (I'm taking inspiration from terraform's aws_acm_certificate
's most_recent
argument on data sources)
I'm hindered by this issue.
I have two wildcards certs in ACM, one more specific than the other, and the controller complains that it found multiple valid certificates for a setting that accepts multiple certificates. Why should I have to hardcode something to exactly what is being autodiscovered?
I'd like to move this on to the next environment without making environment specific settings, and this is issue prevents me from doing that.