[EKS] [request]: Support for certification rotation on EKS clusters
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
- Provide a path to update the root CA on affected clusters to include SKID and AKI extensions.
- Alternatively, offer an automated upgrade/migration strategy or mitigation to support Python 3.13+.
Which service(s) is this request for? EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Recent versions of Python (3.13+) enforce stricter SSL certificate validation and require modern X.509 extensions (Subject Key Identifier and Authority Key Identifier) in the certificate chain.
EKS clusters originally created on Kubernetes v1.16 or earlier have a root CA certificate that lacks these required extensions. This causes Python 3.13+ clients (e.g., using requests, urllib3) to fail with CERTIFICATE_VERIFY_FAILED errors.
This is a blocker for services using these clusters with modern Python runtimes.
Impact:
- Affects all SSL connections from Python 3.13+ clients to EKS clusters created on Kubernetes ≤v1.16.
- Tools like curl, kubectl, and openssl continue to work, which can mask the issue.
Are you currently working around this issue?
Currently we have to use SKIP_TLS_VERIFY env for affected workloads.
Additional context
Clusters created with Kubernetes 1.17 and later did generate a Certificate Authority that contains a "X509v3 Subject Key Identifier" extension, and the resultant Kube API server certificates generated by kubeadm do include an AKI extension.
Python 3.13 (released Oct 2024) enabled VERIFY_X509_PARTIAL_CHAIN and VERIFY_X509_STRICT by default, which require an AKI extension to be set on the serving certificate, and an SKI extension in the CA.
The errors you are experiencing are a result of this Python change, which makes SKI on the CA and AKI on the leaf a requirement, while the RFC 5280 (an RFC governing x509 certificates) marks the AKI as a recommendation. (See https://datatracker.ietf.org/doc/html/rfc5280#section-4.2.1.1). We did explore making a change to kubeadm to set an AKI on Kube API serving certificates to a hash of the CA certificate when the SKI is not present in the CA. This conforms to RFC 5280, and works for clients like curl, but still errors for Python 3.13 clients with a Missing Subject Key Identifier error.
The two immediate solutions to this are to update SSL connections for Kubernetes API Python 3.13 clients to disable VERIFY_X509_STRICT, or to create a new EKS cluster which will have an SKI in the CA and an AKI in the leaf.
EKS does not yet support CA rotation, but we have started researching how to support this. We can use this issue to track the feature. This will enable customers to start using a newly generated Certificate Authority, which would include an SKI, and Kube API server certificates which will include an AKI extension. While eventual certificate rotation will solve the issue, it will be a necessarily disruptive operation that will likely require both pod and node restarts across a cluster.
This would be really important for us. We've a cluster created 5 years ago (I think this was K8s 1.16) and we can not recreate that cluster that easy as it's our productive one. We would really appreciate to get some method recreate the control plane certificate.
@mikestef9 any idea if this has gotten any traction? This is causing issues and will be a bigger and bigger problem.
We're starting to have to disable TLS verification as well, to deal with our clusters that we created prior to 1.17. Having to recreate clusters would be quite burdensome.
This is also important in case of cluster certificate compromise, high-privilege user offboarding, etc.