k8s.io cert-manager cannot renew k8s-io-prod certificate due to second IPv6 ingress

As initially reported and discussed on Slack, the k8s-io-prod certificate (used for the redirector service) is failing to renew.

After some debugging, there are two issues at play here:

#1374 adding a second IPv6 only Ingress resource and the AAAA record configured to point to this - cert-manager only knows to update a single Ingress (named via the edit-in-place annotation) to inject path entries for the HTTP01 challenge solvers. As per https://letsencrypt.org/docs/ipv6-support/, if an AAAA record is returned then Let's Encrypt will prefer that and utilise it first. If we update cert-manager to only update the IPv6 Ingress resource, Let's Encrypt will quite likely pass validation (as it won't check IPv4) however, because cert-manager performs a 'self check' to ensure all routes are serving traffic correctly, and because our Pods do not utilise IPv6, the self check performed by cert-manager will never pass either. Ultimately, we need to ensure both Ingress resources contain the challenge path entries (which is not something that cert-manager supports today).
When running kubectl describe on our Ingress resources, the follow error is shown:

  Warning  Sync    6m38s (x29 over 85m)  loadbalancer-controller  Error during sync: error running backend syncing routine: googleapi: Error 403: Exceeded limit 'MAX_DISTINCT_NAMED_PORTS' on resource 'k8s-ig--ea949c440a044527'. Limit: 1000.0, limitExceeded

It appears that ingress-gce does not clean up 'unused named ports' - it was originally added in https://github.com/kubernetes/ingress-gce/pull/430, but was later reverted in https://github.com/kubernetes/ingress-gce/pull/585.

We can see that there are a lot of named ports associated with the 3 'unmanaged instance groups' that ingress-gce creates:

gcloud compute instance-groups get-named-ports k8s-ig--ea949c440a044527 --project kubernetes-public --zone us-central1-c | wc -l
1001

As you can see here, there are far fewer than 1000 nodePorts in our aaa cluster:

kubectl get services -A -o yaml | grep nodePort
      nodePort: 30062
      nodePort: 32044
      nodePort: 30404
      nodePort: 31694
      nodePort: 32072
      nodePort: 32212
      nodePort: 32382
      nodePort: 30980
      nodePort: 30566
      nodePort: 30633
      nodePort: 32142
      nodePort: 31365
      nodePort: 31558
      nodePort: 31752
      nodePort: 32464
      nodePort: 30414
      nodePort: 32125
      nodePort: 32392
      nodePort: 32046
      nodePort: 31204
      nodePort: 32185
      nodePort: 30887
      nodePort: 30923
      nodePort: 32006
      nodePort: 30046
      nodePort: 32489
      nodePort: 31023
    - nodePort: 31015
    - nodePort: 31614
    - nodePort: 30938
    - nodePort: 30242
    - nodePort: 32282
    - nodePort: 30382

(note that some of these nodePort entries are for the 'challenge solvers' for the currently on-going/blocked renewal, and so they are not present in the get-named-ports output).

Short term solutions/moving forward

The current certificate expires on December 19th (so in ~9 days). We need to resolve both of these issues to get a renewal now.

For (1), I propose we take the simplest approach of manually copying the path entries that cert-manager injects into the second Ingress resource. We will then manually remove them again afterwards. This will allow both the v4 and v6 front end IPs to respond to HTTP01 challenge requests.

For (2), we need to manually clean up some of these named ports. We have a list of all the nodePort allocations from kubectl get svc -A above, so we can write a script to calculate which ports are not actually used and set the full list appropriately for each of the instance groups that ingress-gce manages. If we make a mistake here or miss a port, I am not sure whether GCP will actually just reject because it'd break a load balancer, or if it'll make the associated service be unavailable until that port is added back.

Longer term solutions

For (1), I see a few avenues:

a) cert-manager is modified to be able to update multiple Ingress resources to inject routes for solving. This is a little out of the ordinary, but isn't the worst thing, especially given how many other awkward hoops we have to jump through to make HTTP01 solving work with that wide variety of ingress controller implementations.

b) write a controller that can be used by ingress-gce users to 'sync' the path entries on Ingress resources, treating one as authoritative. This would mean we could configure either the v4 or v6 Ingress to mirror the routes specified on the other one (that cert-manager updates). This feels cleaner than cert-manager updating multiple resources, but it'd be good to get feedback here

c) improve cert-manager's extensibility story to allow for "out of tree" HTTP01 solvers, which would in turn mean we could have a standalone 'ingress-gce-solver' which would understand ingress-gce's nuances. (this would also allow for e.g. an out of tree IngressRoute/VirtualService/Ingress v2 solver too). This is certainly something that the cert-manager project should do anyway, though may be a bit more involved as a resolution given the scope of the issue here.

d) not use IPv6, or use something like ingress-nginx running in-cluster (exposed with a TCP load balancer) to handle ingresses

For (2), we are likely to not be affected for a while again, but:

a) patching ingress-gce to allow it to clean up unused ports (may take a while to have this change in effect)

b) write automation to make it easy for us to clean up unused ports (after learning how to do this safely whilst resolving the issue we face today)

I'm going to mark this as priority/critical-urgent as we have a count-down timer on this before it becomes very visible/public 😅 - if people are available for a call today/over the coming days, we should try and move swiftly to get agreement on the short term solution so we can all go home for the holidays 🎅 🎄

/priority critical-urgent /area cert-manager /cc @thockin @dims @aojea @BenTheElder @bartsmykla

Dec 10 '20 14:12 munnerz

@munnerz: The label(s) area/cert-manager cannot be applied, because the repository doesn't have them

In response to this:

As initially reported and discussed on Slack, the k8s-io-prod certificate (used for the redirector service) is failing to renew.

After some debugging, there are two issues at play here:

#1374 adding a second IPv6 only Ingress resource and the AAAA record configured to point to this - cert-manager only knows to update a single Ingress (named via the edit-in-place annotation) to inject path entries for the HTTP01 challenge solvers. As per https://letsencrypt.org/docs/ipv6-support/, if an AAAA record is returned then Let's Encrypt will prefer that and utilise it first. If we update cert-manager to only update the IPv6 Ingress resource, Let's Encrypt will quite likely pass validation (as it won't check IPv4) however, because cert-manager performs a 'self check' to ensure all routes are serving traffic correctly, and because our Pods do not utilise IPv6, the self check performed by cert-manager will never pass either. Ultimately, we need to ensure both Ingress resources contain the challenge path entries (which is not something that cert-manager supports today).

When running kubectl describe on our Ingress resources, the follow error is shown:
 Warning  Sync    6m38s (x29 over 85m)  loadbalancer-controller  Error during sync: error running backend syncing routine: googleapi: Error 403: Exceeded limit 'MAX_DISTINCT_NAMED_PORTS' on resource 'k8s-ig--ea949c440a044527'. Limit: 1000.0, limitExceeded
It appears that ingress-gce does not clean up 'unused named ports' - it was originally added in https://github.com/kubernetes/ingress-gce/pull/430, but was later reverted in https://github.com/kubernetes/ingress-gce/pull/585.

We can see that there are a lot of named ports associated with the 3 'unmanaged instance groups' that ingress-gce creates:
gcloud compute instance-groups get-named-ports k8s-ig--ea949c440a044527 --project kubernetes-public --zone us-central1-c | wc -l
1001
As you can see here, there are far fewer than 1000 nodePorts in our aaa cluster:
kubectl get services -A -o yaml | grep nodePort
     nodePort: 30062
     nodePort: 32044
     nodePort: 30404
     nodePort: 31694
     nodePort: 32072
     nodePort: 32212
     nodePort: 32382
     nodePort: 30980
     nodePort: 30566
     nodePort: 30633
     nodePort: 32142
     nodePort: 31365
     nodePort: 31558
     nodePort: 31752
     nodePort: 32464
     nodePort: 30414
     nodePort: 32125
     nodePort: 32392
     nodePort: 32046
     nodePort: 31204
     nodePort: 32185
     nodePort: 30887
     nodePort: 30923
     nodePort: 32006
     nodePort: 30046
     nodePort: 32489
     nodePort: 31023
   - nodePort: 31015
   - nodePort: 31614
   - nodePort: 30938
   - nodePort: 30242
   - nodePort: 32282
   - nodePort: 30382
(note that some of these nodePort entries are for the 'challenge solvers' for the currently on-going/blocked renewal, and so they are not present in the get-named-ports output).

Short term solutions/moving forward

The current certificate expires on December 19th (so in ~9 days). We need to resolve both of these issues to get a renewal now.

For (1), I propose we take the simplest approach of manually copying the path entries that cert-manager injects into the second Ingress resource. We will then manually remove them again afterwards. This will allow both the v4 and v6 front end IPs to respond to HTTP01 challenge requests.

For (2), we need to manually clean up some of these named ports. We have a list of all the nodePort allocations from kubectl get svc -A above, so we can write a script to calculate which ports are not actually used and set the full list appropriately for each of the instance groups that ingress-gce manages. If we make a mistake here or miss a port, I am not sure whether GCP will actually just reject because it'd break a load balancer, or if it'll make the associated service be unavailable until that port is added back.

Longer term solutions

For (1), I see a few avenues:

a) cert-manager is modified to be able to update multiple Ingress resources to inject routes for solving. This is a little out of the ordinary, but isn't the worst thing, especially given how many other awkward hoops we have to jump through to make HTTP01 solving work with that wide variety of ingress controller implementations.

b) write a controller that can be used by ingress-gce users to 'sync' the path entries on Ingress resources, treating one as authoritative. This would mean we could configure either the v4 or v6 Ingress to mirror the routes specified on the other one (that cert-manager updates). This feels cleaner than cert-manager updating multiple resources, but it'd be good to get feedback here

c) improve cert-manager's extensibility story to allow for "out of tree" HTTP01 solvers, which would in turn mean we could have a standalone 'ingress-gce-solver' which would understand ingress-gce's nuances. (this would also allow for e.g. an out of tree IngressRoute/VirtualService/Ingress v2 solver too). This is certainly something that the cert-manager project should do anyway, though may be a bit more involved as a resolution given the scope of the issue here.

d) not use IPv6, or use something like ingress-nginx running in-cluster (exposed with a TCP load balancer) to handle ingresses

For (2), we are likely to not be affected for a while again, but:

a) patching ingress-gce to allow it to clean up unused ports (may take a while to have this change in effect)

b) write automation to make it easy for us to clean up unused ports (after learning how to do this safely whilst resolving the issue we face today)

I'm going to mark this as priority/critical-urgent as we have a count-down timer on this before it becomes very visible/public 😅 - if people are available for a call today/over the coming days, we should try and move swiftly to get agreement on the short term solution so we can all go home for the holidays 🎅 🎄

/priority critical-urgent /area cert-manager /cc @thockin @dims @aojea @BenTheElder @bartsmykla

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 10 '20 14:12 k8s-ci-robot

/area infra/cert-manager

Dec 10 '20 14:12 munnerz

@munnerz A possible short-term workaround for (2) : https://github.com/kubernetes/ingress-gce/issues/43#issue-264678302/

Dec 10 '20 15:12 ameukam

Nice find 😄 here is a list of in-use ports after running ports=$(gcloud compute backend-services list --global --format='value(port,port)' | xargs printf 'port%s:%s,'):

port30046:30046,port30062:30062,port30242:30242,port30382:30382,port30938:30938,port31015:31015,port31023:31023,port31614:31614,port32282:32282,port32489:32489,port32044:32044,port30923:30923,

I can go ahead and update the named ports for each instance group to this list. What is the best way to coordinate running a potentially risky command like this? :)

Dec 10 '20 16:12 munnerz

/assign

Dec 10 '20 17:12 bowei

@spencerhance @rramkumar1

Dec 10 '20 22:12 bowei

@munnerz What services would be potentially impacted (documenting a list would be great)? That way we can determine risk and who we'd have to notify.

Dec 11 '20 04:12 cblecker

An update - @thockin and I worked together to 1) clean up old named ports and 2) deal with the "second Ingress" issues (caused by the addition of the IPv6 ingress) by copying across rules temporarily for this renewal only to get us over the 'hump'

The certificate now has notAfter=Mar 11 18:26:22 2021 GMT.

In the meantime, there's a number of issues we should work to resolve (copied from Slack):

I think the whole “cert-manager updating two ingresses” thing could be solved by either having cert-manager understand to: a) update two ingresses b) update all ingresses with path configurations for the domain being solved within a namespace (more complex, and doesn’t account for default backends) c) write an ‘ingress rule sync controller’ - which would also be generally useful for other ingress-gce users wanting to run dual stack services
having two addresses associated with the one ingress-gce Ingress (one v4 and one v6) would be great, though I’m not sure how realistically we’ll be able to achieve this in the next 3mths? (?)
have ingress-gce clean up named ports

@cblecker you can see the list here, for future reference 😄: https://github.com/kubernetes/k8s.io/blob/4260fd2b5494784e757c8126e20fe80fe0a9cd38/k8s.io/certificate-prod.yaml#L14-L66

Dec 11 '20 19:12 munnerz

/priority important-soon /remove-priority critical-urgent

Dec 11 '20 19:12 munnerz

awesome! thank you!

Dec 11 '20 20:12 cblecker

Thanks a ton @munnerz @thockin

Dec 11 '20 20:12 dims

GCP folks are looking at (3) and then (2) longer term

Dec 11 '20 21:12 thockin

/assign

I'm going to work on 1(c) above today (controller to sync rules portion of Ingress resources). In the meantime, as we are nearing March 11th and this has not been resolved yet, I am applying the same workaround as in December to get us over the hump.

Mar 02 '21 10:03 munnerz

Update - certificate has been renewed:

  status:
    conditions:
    - lastTransitionTime: "2020-03-06T11:15:52Z"
      message: Certificate is up to date and has not expired
      reason: Ready
      status: "True"
      type: Ready
    notAfter: "2021-05-31T10:22:27Z"

Mar 02 '21 12:03 munnerz

~~It seems like the workaround for this is to move the orphaned ports to another region? Is that right? I'm guessing that means there is no way to directly remove the ports? I cannot see any obvious method via the gcloud compute instance-groups cli. Did anybody find a better solution? I'm worried I will just have to jump from region to region.~~ (sorry misinterpreted the way set-named-ports works) Obviously, I'm also going to avoid recreating ingresses unnecessarily, but that doesn't seem like a permanent solution when we are in heavy development and have frequent deployments.

It seems strange to me that more people are not hitting this issue. Is there some mitigating difference in how people are deploying services that I haven't thought of?

Jun 10 '21 15:06 DaveWelling

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 28 '21 23:12 k8s-triage-robot

/remove-lifecycle stale /lifecycle frozen

Jan 03 '22 09:01 ameukam

Do we need this now?

On Mon, Jan 3, 2022 at 1:32 AM Arnaud M. @.***> wrote:

/remove-lifecycle stale /lifecycle frozen

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/1476#issuecomment-1003967729, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVDIC5QDBA2PGEQGND3UUFUKXANCNFSM4UVDE5QA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

Jan 03 '22 21:01 thockin

@thockin I don't think we need this again. I just don't want to close it until cert-manager is fully removed.

Jan 03 '22 21:01 ameukam

@ameukam should this be on the radar for 2023?

Jan 04 '23 18:01 riaankleinhans

Yes we should be removing cert-manager #4160

Jan 04 '23 18:01 BenTheElder

Issue addressed and GKE Networking now offers HTTP Route and support dual-stack.

/close

Aug 23 '24 10:08 ameukam

@ameukam: Closing this issue.

In response to this:

Issue addressed and GKE Networking now offers HTTP Route and support dual-stack.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Aug 23 '24 10:08 k8s-ci-robot

k8s.io k8s.io copied to clipboard

cert-manager cannot renew k8s-io-prod certificate due to second IPv6 ingress

Short term solutions/moving forward

Longer term solutions

Short term solutions/moving forward

Longer term solutions

k8s.io
k8s.io copied to clipboard