aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

[v2] Error when there are multiple certificates for a host

Open postmaxin opened this issue 3 years ago • 28 comments

It is perfectly fine for there two be two certificates available for a particular host. I have this right now because I am in mid-migration and have both a wildcard, and host-specific, cert. In these cases, the controller should either add both certificates, or add (in my opinion) the least specific certificate.

{"level":"error","ts":1604259893.3354607,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"dex","namespace":"kube-system","error":"multiple certificate found for host: dex.admin.rex.sh, certARNs: [arn:aws:acm:us-west-2:355508092300:certificate/c6d7d07b-0aec-49ae-aff6-3616afb4b0a7 arn:aws:acm:us-west-2:355508092300:certificate/eeba7334-1671-4ad9-bc2f-d458ef972fdb]"}

postmaxin avatar Nov 01 '20 19:11 postmaxin

@postmaxin That's why we allow to specify the certificate-arn annotation to override the cert-discovery feature. The controller cannot make a choice which certificate to use when there are multiple matches. 😄

M00nF1sh avatar Nov 02 '20 20:11 M00nF1sh

It can, it just does not. It has no criteria. Example criteria would be: Length of domain name, expiry date, etc. If it makes a choice based on some criteria, it may not be the exact criteria everyone who uses AWSv2 would choose. However, it would be a choice, that is predictable, that breaks things less than having AWSv2 give up.

postmaxin avatar Nov 02 '20 20:11 postmaxin

I would suggest that in this situation, the controller should pick a certificate in a deterministic way, but log a warning saying that there were other choices. That's better than it just throwing it's hands up and walking away.

postmaxin avatar Nov 02 '20 22:11 postmaxin

@postmaxin I agree we can do it if we have a deterministic way. (e.g. pure sort based on lexical order of certificate arn). will it suits your need?

M00nF1sh avatar Nov 03 '20 00:11 M00nF1sh

It is behaviour that changed between v1 and v2 I think, so I'd say it should do whatever v1 did to keep the 'backwards-compatible' promise :)

Scenario that I have ingress rules for "example.com", "*.example.com", "sub.example.com" and "*.sub.example.com". With two certificates, "*.example.com" with additional domains "example.com" and "*.sub.example.com" with additional domain "sub.example.com". So there is some overlap in the domain names. But this worked in v1, v2 complains that the sub.example.com ingress now matches two certificates, which is correct. But v1 seemed to pick one (the 'exact match')...?

jor-rit avatar Nov 03 '20 14:11 jor-rit

@jorrit-wehelp it's same behavior with v1: https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/master/internal/alb/ls/cert_discovery.go#L54 (if multiple certificate found, it will error out)

M00nF1sh avatar Nov 03 '20 18:11 M00nF1sh

ALB supports multiple certificates, so why doesn't the controller just add them all?

stephenharris avatar Nov 14 '20 23:11 stephenharris

I see the same issue, despite having the alb.ingress.kubernetes.io/certificate-arn annotation configured. The "old" certificate was used by the controller, after adding the new certificate and changing the configuration, it gets stuck.

After thinking hard, I noticed this: I have two ingress objects sharing an ingress group. As alb.ingress.kubernetes.io/certificate-arn is listed as "merge", I annotated only one Ingress in the group, assuming it would be used for all ingresses in the group. But the controller logged this message for the other ingress in the same group without the annotation.

So apparently the "determine-the-right-certificate" code runs for each ingress object individually, not for the merged annotations of an ingress group?

MartinEmrich avatar Nov 17 '20 16:11 MartinEmrich

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Feb 15 '21 16:02 fejta-bot

/remove-lifecycle stale

This is an issue especially when you are using wildcard subdomains. You might want *.test.company.com but *.company.com already exists. There's no reason you can't add both.

So, for an org using wildcard subdomains, we have to manually specify the ARN for every ingress we provision?

edit: I found a workaround. The wildcard subdomain had a SAN that conflicted with the wildcard domain.

*.test.company.com with a SAN of test.company.com -- conflicts with *.company.com

Recreated the ACM certificate of *.test.company.com with no SAN, and the ingress controller added both *.test.company.com and *.company.com. The routes used in this case are *.test.company.com and test.company.com.

Still, I think it would be better if there was some deterministic logic and any choice made, instead of throwing the "could not auto-load certificates from ACM: multiple certificate found for host" error and giving up.

fluxx-sleblanc avatar Mar 05 '21 23:03 fluxx-sleblanc

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 04 '21 00:06 fejta-bot

This was a frustrating bug/behavior to find at the last minute when trying to go-live.

At the very least, could the AWS LBC not attempt auto-discovery beyond the list you give with the certificate-arn annotation? That would limit the opportunity for confusion. Not that it should matter.

matthewmrichter avatar Jul 26 '21 19:07 matthewmrichter

@M00nF1sh , There is any way to disable the Certificate Discovery?

I can't see any way to disable it on Docs: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/cert_discovery/

I really think that if I could manage all my ALB configuration level (such as certificates, security groups, ssl policy, scheme and etc..) in 1 place like a global ingress and all other target group configuration level on my other ingress resources it would be great.

devopsmash avatar Aug 16 '21 11:08 devopsmash

I really think that if I could manage all my ALB configuration level (such as certificates, security groups, ssl policy, scheme and etc..) in 1 place like a global ingress and all other target group configuration level on my other ingress resources it would be great.

Off topic but @dsaydon90 - Look into the alb.ingress.kubernetes.io/group.name annotation. What we have done for this at my organization is to have a helm deployment of dummy "model" ingresses that have the global stuff (certs, SG's etc) along with the alb.ingress.kubernetes.io/group.name annotation. Then simply create ingresses in the other apps that use that annotation. AWS-LBC will merge the new TG's into the "model" ingresses with the matching group.name. This way we don't have to track all the annotations for every chart using the LB.

matthewmrichter avatar Aug 16 '21 18:08 matthewmrichter

Hi @matthewmrichter , Thank you for providing your solution. Actually, This is what I currently have, and I was set my certificates in 1 ingress resource, but my problem occurred once I'm using additional ingress resource which contains the annotation alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}] , the alb-ingress will detect this annotation and will try to auto discover the right certificate for it.

What I'm looking for is to disable the auto discovery for certificate so I will have full control on my ALB configuration.

devopsmash avatar Aug 17 '21 08:08 devopsmash

Is this bug still happening in v2.3?

matthewmrichter avatar Nov 01 '21 19:11 matthewmrichter

@dsaydon90 To disable it, you have to manually specify at least one TLS certificate per Ingress.

@matthewmrichter There isn't any changes made to this logic yet. We have been planning to relax to restriction of discover one certificate per Ingress, so that you should be able to specify certificates in a centralized Ingress.

M00nF1sh avatar Nov 11 '21 18:11 M00nF1sh

What I'm looking for is to disable the auto discovery for certificate so I will have full control on my ALB configuration.

@M00nF1sh this doesn't seem like the same behavior as V1.

We have ingresses which match two certificates - *.example.com and foo.example.com. These ingresses were created with V1 and the controller added the *.example.com cert via the discovery feature. Now, with V2 we get the error regarding the multiple certificates (for the existing ingresses), so this does seem like there were breaking changes somewhere.

As a side note, why can't the controller just load all of the certificates it finds?

dudicoco avatar Nov 29 '21 09:11 dudicoco

@dudicoco I am in the same situation, the other cert is a wildcard. Why can't it do exact match?

kferrone avatar Dec 03 '21 17:12 kferrone

@postmaxin I agree we can do it if we have a deterministic way. (e.g. pure sort based on lexical order of certificate arn). will it suits your need?

AWS has an algorithm for choosing the best cert if multiple are provided. I see no reason for us to do the same login on our part.

Their algorithm*

  1. Public key algorithm (prefer ECDSA over RSA)
  2. Hashing algorithm (prefer SHA over MD5)
  3. Key length (prefer the largest)
  4. Validity period

LechG avatar Jan 07 '22 09:01 LechG

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 07 '22 10:04 k8s-triage-robot

/remove-lifecycle stale

matthewmrichter avatar Apr 07 '22 13:04 matthewmrichter

/remove-lifecycle stale

paul-civitas avatar May 20 '22 19:05 paul-civitas

We're dealing with this also. There's no reason it can't just add all the certs that match, but we need to completely toss out the cert discovery functionality to use this controller it seems.

paul-civitas avatar May 20 '22 19:05 paul-civitas

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 18 '22 21:08 k8s-triage-robot

It could be an extra option that enables such a feature, so that compatibility is not broken. Any change to have it?

/remove-lifecycle stale

mih-kopylov avatar Aug 19 '22 06:08 mih-kopylov

/remove-lifecycle stale

k1rk avatar Sep 06 '22 07:09 k1rk

Seems like the most sensible and straightforward deterministic way is to select the most recent valid cert arn, or provide an annotation to do the same.

reubenavery avatar Sep 28 '22 14:09 reubenavery

I have the use case of rotating my certificate in a multi step process where I

  • Create a new cert
  • Update all associations to the old cert to use the new one
  • Delete the old cert

It would be convenient for me to have the controller pick the latest certificate (I'm taking inspiration from terraform's aws_acm_certificate's most_recent argument on data sources)

drewcorlin1 avatar Nov 02 '22 22:11 drewcorlin1

I'm hindered by this issue.

I have two wildcards certs in ACM, one more specific than the other, and the controller complains that it found multiple valid certificates for a setting that accepts multiple certificates. Why should I have to hardcode something to exactly what is being autodiscovered?

I'd like to move this on to the next environment without making environment specific settings, and this is issue prevents me from doing that.

bpaddy-brivo avatar Nov 09 '22 18:11 bpaddy-brivo