k8s.io
k8s.io copied to clipboard
cert-manager cannot renew k8s-io-prod certificate due to second IPv6 ingress
As initially reported and discussed on Slack, the k8s-io-prod
certificate (used for the redirector service) is failing to renew.
After some debugging, there are two issues at play here:
-
#1374 adding a second IPv6 only Ingress resource and the AAAA record configured to point to this - cert-manager only knows to update a single Ingress (named via the
edit-in-place
annotation) to injectpath
entries for the HTTP01 challenge solvers. As per https://letsencrypt.org/docs/ipv6-support/, if an AAAA record is returned then Let's Encrypt will prefer that and utilise it first. If we update cert-manager to only update the IPv6 Ingress resource, Let's Encrypt will quite likely pass validation (as it won't check IPv4) however, because cert-manager performs a 'self check' to ensure all routes are serving traffic correctly, and because our Pods do not utilise IPv6, the self check performed by cert-manager will never pass either. Ultimately, we need to ensure both Ingress resources contain the challenge path entries (which is not something that cert-manager supports today). -
When running
kubectl describe
on our Ingress resources, the follow error is shown:
Warning Sync 6m38s (x29 over 85m) loadbalancer-controller Error during sync: error running backend syncing routine: googleapi: Error 403: Exceeded limit 'MAX_DISTINCT_NAMED_PORTS' on resource 'k8s-ig--ea949c440a044527'. Limit: 1000.0, limitExceeded
It appears that ingress-gce does not clean up 'unused named ports' - it was originally added in https://github.com/kubernetes/ingress-gce/pull/430, but was later reverted in https://github.com/kubernetes/ingress-gce/pull/585.
We can see that there are a lot of named ports associated with the 3 'unmanaged instance groups' that ingress-gce creates:
gcloud compute instance-groups get-named-ports k8s-ig--ea949c440a044527 --project kubernetes-public --zone us-central1-c | wc -l
1001
As you can see here, there are far fewer than 1000 nodePorts in our aaa cluster:
kubectl get services -A -o yaml | grep nodePort
nodePort: 30062
nodePort: 32044
nodePort: 30404
nodePort: 31694
nodePort: 32072
nodePort: 32212
nodePort: 32382
nodePort: 30980
nodePort: 30566
nodePort: 30633
nodePort: 32142
nodePort: 31365
nodePort: 31558
nodePort: 31752
nodePort: 32464
nodePort: 30414
nodePort: 32125
nodePort: 32392
nodePort: 32046
nodePort: 31204
nodePort: 32185
nodePort: 30887
nodePort: 30923
nodePort: 32006
nodePort: 30046
nodePort: 32489
nodePort: 31023
- nodePort: 31015
- nodePort: 31614
- nodePort: 30938
- nodePort: 30242
- nodePort: 32282
- nodePort: 30382
(note that some of these nodePort entries are for the 'challenge solvers' for the currently on-going/blocked renewal, and so they are not present in the get-named-ports
output).
Short term solutions/moving forward
The current certificate expires on December 19th (so in ~9 days). We need to resolve both of these issues to get a renewal now.
For (1), I propose we take the simplest approach of manually copying the path
entries that cert-manager injects into the second Ingress resource. We will then manually remove them again afterwards. This will allow both the v4 and v6 front end IPs to respond to HTTP01 challenge requests.
For (2), we need to manually clean up some of these named ports. We have a list of all the nodePort
allocations from kubectl get svc -A
above, so we can write a script to calculate which ports are not actually used and set the full list appropriately for each of the instance groups that ingress-gce manages. If we make a mistake here or miss a port, I am not sure whether GCP will actually just reject because it'd break a load balancer, or if it'll make the associated service be unavailable until that port is added back.
Longer term solutions
For (1), I see a few avenues:
a) cert-manager is modified to be able to update multiple Ingress resources to inject routes for solving. This is a little out of the ordinary, but isn't the worst thing, especially given how many other awkward hoops we have to jump through to make HTTP01 solving work with that wide variety of ingress controller implementations.
b) write a controller that can be used by ingress-gce users to 'sync' the path
entries on Ingress resources, treating one as authoritative. This would mean we could configure either the v4 or v6 Ingress to mirror the routes specified on the other one (that cert-manager updates). This feels cleaner than cert-manager updating multiple resources, but it'd be good to get feedback here
c) improve cert-manager's extensibility story to allow for "out of tree" HTTP01 solvers, which would in turn mean we could have a standalone 'ingress-gce-solver' which would understand ingress-gce's nuances. (this would also allow for e.g. an out of tree IngressRoute/VirtualService/Ingress v2 solver too). This is certainly something that the cert-manager project should do anyway, though may be a bit more involved as a resolution given the scope of the issue here.
d) not use IPv6, or use something like ingress-nginx running in-cluster (exposed with a TCP load balancer) to handle ingresses
For (2), we are likely to not be affected for a while again, but:
a) patching ingress-gce to allow it to clean up unused ports (may take a while to have this change in effect)
b) write automation to make it easy for us to clean up unused ports (after learning how to do this safely whilst resolving the issue we face today)
I'm going to mark this as priority/critical-urgent
as we have a count-down timer on this before it becomes very visible/public 😅 - if people are available for a call today/over the coming days, we should try and move swiftly to get agreement on the short term solution so we can all go home for the holidays 🎅 🎄
/priority critical-urgent /area cert-manager /cc @thockin @dims @aojea @BenTheElder @bartsmykla
@munnerz: The label(s) area/cert-manager
cannot be applied, because the repository doesn't have them
In response to this:
As initially reported and discussed on Slack, the
k8s-io-prod
certificate (used for the redirector service) is failing to renew.After some debugging, there are two issues at play here:
#1374 adding a second IPv6 only Ingress resource and the AAAA record configured to point to this - cert-manager only knows to update a single Ingress (named via the
edit-in-place
annotation) to injectpath
entries for the HTTP01 challenge solvers. As per https://letsencrypt.org/docs/ipv6-support/, if an AAAA record is returned then Let's Encrypt will prefer that and utilise it first. If we update cert-manager to only update the IPv6 Ingress resource, Let's Encrypt will quite likely pass validation (as it won't check IPv4) however, because cert-manager performs a 'self check' to ensure all routes are serving traffic correctly, and because our Pods do not utilise IPv6, the self check performed by cert-manager will never pass either. Ultimately, we need to ensure both Ingress resources contain the challenge path entries (which is not something that cert-manager supports today).When running
kubectl describe
on our Ingress resources, the follow error is shown:Warning Sync 6m38s (x29 over 85m) loadbalancer-controller Error during sync: error running backend syncing routine: googleapi: Error 403: Exceeded limit 'MAX_DISTINCT_NAMED_PORTS' on resource 'k8s-ig--ea949c440a044527'. Limit: 1000.0, limitExceeded
It appears that ingress-gce does not clean up 'unused named ports' - it was originally added in https://github.com/kubernetes/ingress-gce/pull/430, but was later reverted in https://github.com/kubernetes/ingress-gce/pull/585.
We can see that there are a lot of named ports associated with the 3 'unmanaged instance groups' that ingress-gce creates:
gcloud compute instance-groups get-named-ports k8s-ig--ea949c440a044527 --project kubernetes-public --zone us-central1-c | wc -l 1001
As you can see here, there are far fewer than 1000 nodePorts in our aaa cluster:
kubectl get services -A -o yaml | grep nodePort nodePort: 30062 nodePort: 32044 nodePort: 30404 nodePort: 31694 nodePort: 32072 nodePort: 32212 nodePort: 32382 nodePort: 30980 nodePort: 30566 nodePort: 30633 nodePort: 32142 nodePort: 31365 nodePort: 31558 nodePort: 31752 nodePort: 32464 nodePort: 30414 nodePort: 32125 nodePort: 32392 nodePort: 32046 nodePort: 31204 nodePort: 32185 nodePort: 30887 nodePort: 30923 nodePort: 32006 nodePort: 30046 nodePort: 32489 nodePort: 31023 - nodePort: 31015 - nodePort: 31614 - nodePort: 30938 - nodePort: 30242 - nodePort: 32282 - nodePort: 30382
(note that some of these nodePort entries are for the 'challenge solvers' for the currently on-going/blocked renewal, and so they are not present in the
get-named-ports
output).Short term solutions/moving forward
The current certificate expires on December 19th (so in ~9 days). We need to resolve both of these issues to get a renewal now.
For (1), I propose we take the simplest approach of manually copying the
path
entries that cert-manager injects into the second Ingress resource. We will then manually remove them again afterwards. This will allow both the v4 and v6 front end IPs to respond to HTTP01 challenge requests.For (2), we need to manually clean up some of these named ports. We have a list of all the
nodePort
allocations fromkubectl get svc -A
above, so we can write a script to calculate which ports are not actually used and set the full list appropriately for each of the instance groups that ingress-gce manages. If we make a mistake here or miss a port, I am not sure whether GCP will actually just reject because it'd break a load balancer, or if it'll make the associated service be unavailable until that port is added back.Longer term solutions
For (1), I see a few avenues:
a) cert-manager is modified to be able to update multiple Ingress resources to inject routes for solving. This is a little out of the ordinary, but isn't the worst thing, especially given how many other awkward hoops we have to jump through to make HTTP01 solving work with that wide variety of ingress controller implementations.
b) write a controller that can be used by ingress-gce users to 'sync' the
path
entries on Ingress resources, treating one as authoritative. This would mean we could configure either the v4 or v6 Ingress to mirror the routes specified on the other one (that cert-manager updates). This feels cleaner than cert-manager updating multiple resources, but it'd be good to get feedback herec) improve cert-manager's extensibility story to allow for "out of tree" HTTP01 solvers, which would in turn mean we could have a standalone 'ingress-gce-solver' which would understand ingress-gce's nuances. (this would also allow for e.g. an out of tree IngressRoute/VirtualService/Ingress v2 solver too). This is certainly something that the cert-manager project should do anyway, though may be a bit more involved as a resolution given the scope of the issue here.
d) not use IPv6, or use something like ingress-nginx running in-cluster (exposed with a TCP load balancer) to handle ingresses
For (2), we are likely to not be affected for a while again, but:
a) patching ingress-gce to allow it to clean up unused ports (may take a while to have this change in effect)
b) write automation to make it easy for us to clean up unused ports (after learning how to do this safely whilst resolving the issue we face today)
I'm going to mark this as
priority/critical-urgent
as we have a count-down timer on this before it becomes very visible/public 😅 - if people are available for a call today/over the coming days, we should try and move swiftly to get agreement on the short term solution so we can all go home for the holidays 🎅 🎄/priority critical-urgent /area cert-manager /cc @thockin @dims @aojea @BenTheElder @bartsmykla
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/area infra/cert-manager
@munnerz A possible short-term workaround for (2) : https://github.com/kubernetes/ingress-gce/issues/43#issue-264678302/
Nice find 😄 here is a list of in-use ports after running ports=$(gcloud compute backend-services list --global --format='value(port,port)' | xargs printf 'port%s:%s,')
:
port30046:30046,port30062:30062,port30242:30242,port30382:30382,port30938:30938,port31015:31015,port31023:31023,port31614:31614,port32282:32282,port32489:32489,port32044:32044,port30923:30923,
I can go ahead and update the named ports for each instance group to this list. What is the best way to coordinate running a potentially risky command like this? :)
/assign
@spencerhance @rramkumar1
@munnerz What services would be potentially impacted (documenting a list would be great)? That way we can determine risk and who we'd have to notify.
An update - @thockin and I worked together to 1) clean up old named ports and 2) deal with the "second Ingress" issues (caused by the addition of the IPv6 ingress) by copying across rules temporarily for this renewal only to get us over the 'hump'
The certificate now has notAfter=Mar 11 18:26:22 2021 GMT
.
In the meantime, there's a number of issues we should work to resolve (copied from Slack):
-
I think the whole “cert-manager updating two ingresses” thing could be solved by either having cert-manager understand to: a) update two ingresses b) update all ingresses with path configurations for the domain being solved within a namespace (more complex, and doesn’t account for default backends) c) write an ‘ingress rule sync controller’ - which would also be generally useful for other ingress-gce users wanting to run dual stack services
-
having two addresses associated with the one ingress-gce Ingress (one v4 and one v6) would be great, though I’m not sure how realistically we’ll be able to achieve this in the next 3mths? (?)
-
have ingress-gce clean up named ports
@cblecker you can see the list here, for future reference 😄: https://github.com/kubernetes/k8s.io/blob/4260fd2b5494784e757c8126e20fe80fe0a9cd38/k8s.io/certificate-prod.yaml#L14-L66
/priority important-soon /remove-priority critical-urgent
awesome! thank you!
Thanks a ton @munnerz @thockin
GCP folks are looking at (3) and then (2) longer term
/assign
I'm going to work on 1(c) above today (controller to sync rules
portion of Ingress resources). In the meantime, as we are nearing March 11th and this has not been resolved yet, I am applying the same workaround as in December to get us over the hump.
Update - certificate has been renewed:
status:
conditions:
- lastTransitionTime: "2020-03-06T11:15:52Z"
message: Certificate is up to date and has not expired
reason: Ready
status: "True"
type: Ready
notAfter: "2021-05-31T10:22:27Z"
~~It seems like the workaround for this is to move the orphaned ports to another region? Is that right? I'm guessing that means there is no way to directly remove the ports? I cannot see any obvious method via the gcloud compute instance-groups
cli. Did anybody find a better solution? I'm worried I will just have to jump from region to region.~~ (sorry misinterpreted the way set-named-ports works)
Obviously, I'm also going to avoid recreating ingresses unnecessarily, but that doesn't seem like a permanent solution when we are in heavy development and have frequent deployments.
It seems strange to me that more people are not hitting this issue. Is there some mitigating difference in how people are deploying services that I haven't thought of?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
Do we need this now?
On Mon, Jan 3, 2022 at 1:32 AM Arnaud M. @.***> wrote:
/remove-lifecycle stale /lifecycle frozen
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/1476#issuecomment-1003967729, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVDIC5QDBA2PGEQGND3UUFUKXANCNFSM4UVDE5QA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were assigned.Message ID: @.***>
@thockin I don't think we need this again. I just don't want to close it until cert-manager is fully removed.
@ameukam should this be on the radar for 2023?
Yes we should be removing cert-manager #4160
Issue addressed and GKE Networking now offers HTTP Route
and support dual-stack.
/close
@ameukam: Closing this issue.
In response to this:
Issue addressed and GKE Networking now offers
HTTP Route
and support dual-stack./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.