strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

Python >= 3.13 clients fail to connect with self-signed TLS certs due to VERIFY_X509_STRICT

Open fallen-up opened this issue 7 months ago • 20 comments

We are using Strimzi Kafka with authentication.type: tls and self-signed certificates.

Clients running on Python versions ≤3.12 have been able to connect without issues. However, after upgrading to Python >=3.13, connection attempts fail with the following error:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Missing Authority Key Identifier (_ssl.c:1020)

This appears to be caused by a change introduced in Python 3.13, where ssl.create_default_context() now includes the VERIFY_X509_STRICT flag by default: https://docs.python.org/3/whatsnew/3.13.html#ssl

Note: VERIFY_X509_STRICT may reject pre-RFC 5280 or malformed certificates that the underlying OpenSSL implementation might otherwise accept.

A workaround is to disable the flag manually:

import ssl

ctx = ssl.create_default_context()
ctx.verify_flags &= ~ssl.VERIFY_X509_STRICT

However, this is not ideal as it reduces the level of certificate validation. The issue is not related to encryption but rather strict RFC-5280 compliance — in particular, the absence of an Authority Key Identifier in the CA certificate.

As more teams begin migrating to Python ≥3.13, this is becoming a more pressing and widespread issue.

Please consider updating the certificate generation process in Strimzi to produce RFC 5280-compliant certificates, or at least provide an option (e.g., via feature gate) to enable such compliance when needed.

Thanks in advance!

fallen-up avatar Apr 22 '25 22:04 fallen-up

I found out that this problem is caused by broker certificates which don't have AKI now. When Python connects to Kafka it checks that brokers should have certificates which are signed by the same CA and SKI of CA should be equal to AKI of a broker certificate

o-afanasenko avatar May 01 '25 12:05 o-afanasenko

I had to look into the RFC-5280 and get more info about the AKI. AFAIU it's not a mandatory thing but optional, but it seems that your Python client has it enabled by default. I also got some information about how the Java clients behave and it seems that Java truststore manager runs this validation mostly through the truststore (so looking there is there is the CA used to sign the certificate and it's trusted because in the store). So there is no usage of AKI there. Anyway, I am not against adding it if helps other clients to work. Of course, it doesn't have to break compatibility with others but taking into account that this validation is optional, other clients would just skip it without checking at the AKI extension. I also left a comment on the corresponding PR. Finally, I will leave the others to have an opinion on this.

ppatierno avatar May 05 '25 12:05 ppatierno

If we'll move forward with this issue and the corresponding PR, we should also check that cert-manager issues certificates including the AKI. I couldn't find anything explicit in the cert-manager documentation but maybe it should be verified. @katheris anything you already know about this taking into account your work with cert-manager for Strimzi?

ppatierno avatar May 05 '25 16:05 ppatierno

I wonder if this should have a feature gate to introduce it gradually -> because it can also happen that adding it will cause issues to someone else. FG would allow to make it optional first and give more time to others to adjust if needed. I'm not really a big expert on TLs, so not sure how likely it is it will cause problems somewhere. But given the number of environments, old clients, Java versions, OS versions etc., it is pretty hard for us to test it.

scholzj avatar May 05 '25 19:05 scholzj

Guys, @ppatierno @scholzj, you are asking for the opposite things. Now I have a gradual approach - only broker and CC certificates are affected as @scholzj asks. But @ppatierno suggested to apply AKI to all certificates. Personally I prefer current way and if it is fine, I will do a PR for a newer version for other certificate types.

o-afanasenko avatar May 06 '25 06:05 o-afanasenko

I am not sure this change really needs a FG. What this is going to add is a field AKI on the servers' certificates and it's then on the client side using this field to validate the issuer. I read that, for example, Java doesn't use it but just leverage the truststore (so the issuer is fine, if it's in the truststore). It seems that the Python client validates it and we don't know other clients of course. But AFAICS it's on client side validating, so if it's there it helps clients which use it for validation, but should not break clients which are not using it for validation. This is my understanding.

But @ppatierno suggested to apply AKI to all certificates. Personally I prefer current way and if it is fine,

I was suggesting (on the PR) to add it to the self-signed CA as well (where SKI will be the same as AKI) for consistency but I can live without it.

I will do a PR for a newer version for other certificate types.

Wdym? Which certificate types are you talking about?

ppatierno avatar May 06 '25 07:05 ppatierno

I don't think we are asking for opposite things. I agree with Paolo on doing this for all certificates.

What I suggest (at least for consideration) is that this should be introduced through a feature gate to make sure it is opt-in first, then opt-out and only at the end it is enabled permanently for everyone.

PS: Please keep in mind that sometimes things need to be discussed and agreed on first. This normally happens during the issue triage on the community call.

On Tue, May 6, 2025, 07:01 o-afanasenko @.***> wrote:

o-afanasenko left a comment (strimzi/strimzi-kafka-operator#11375) https://github.com/strimzi/strimzi-kafka-operator/issues/11375#issuecomment-2853368261

Guys, @ppatierno https://github.com/ppatierno @scholzj https://github.com/scholzj, you are asking for the opposite things. Now I have a gradual approach - only broker and CC certificates are affected as @scholzj https://github.com/scholzj asks. But @ppatierno https://github.com/ppatierno suggested to apply AKI to all certificates. Personally I prefer current way and if it is fine, I will do a PR for a newer version for other certificate types.

— Reply to this email directly, view it on GitHub https://github.com/strimzi/strimzi-kafka-operator/issues/11375#issuecomment-2853368261, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLFOR6UFI7WD7WMDV7RUMT25BF3XAVCNFSM6AAAAAB3U2KZOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNJTGM3DQMRWGE . You are receiving this because you were mentioned.Message ID: @.***>

scholzj avatar May 06 '25 07:05 scholzj

Wdym? Which certificate types are you talking about?

I was talking about client certificates. But now I see you suggest to add AKI to the self-signed CA and to make this feature optional. I need some time to investigate how to do this.

This normally happens during the issue triage on the community call.

Should I wait for the result of the next discussion before proceeding?

o-afanasenko avatar May 06 '25 11:05 o-afanasenko

@ppatierno I tried to add AKI to CA cert but on Mac it doesn't work. I did small research and this is confirmed in RFC “There is one exception; where a CA distributes its public key in the form of a ‘self-signed’ certificate, the authority key identifier MAY be omitted.” — RFC 3280, Section 4.2.1.1

o-afanasenko avatar May 06 '25 13:05 o-afanasenko

@o-afanasenko I know but it says "MAY be omitted". Not working on Mac is not a reason for not having it. I can't see Kafka clusters or clients running on Mac in production imho. I would investigate why it's not working on Mac and if there is anything you need to do.

ppatierno avatar May 06 '25 13:05 ppatierno

This normally happens during the issue triage on the community call.

Should I wait for the result of the next discussion before proceeding?

Yes, I think we should first clarify how we should proceed with this to avoid you updating the PR again and again everytime someone has a different opinion. If there is general agreement to use feature gate, it would likely also require a proposal (https://github.com/strimzi/proposals).

I know but it says "MAY be omitted". Not working on Mac is not a reason for not having it. I can't see Kafka clusters or clients running on Mac in production imho. I would investigate why it's not working on Mac and if there is anything you need to do.

I guess it depends on what does not work there. Assuming we are talking about OpenSSL adding the AKI to the CA, I would guess that if it does not work on MacOS it won't work on Linux (assuming it is really OpenSSL that is not working and not LibreSSL). If it would work on Linux, then I guess the only concern would be related unit tests, but I think we skipped them before on MacOS, so we might be able to deal with it the same way again.

However, there are absolutely Kafka clients in production use on MacOS. And likely even much more development. So assuming the AKI CA does not work in any clients on MacOS, I would say it is a blocker and the whole issue would be way more complicated.

scholzj avatar May 06 '25 18:05 scholzj

I would guess that if it does not work on MacOS it won't work on Linux

Not a MacOS user, why this assumption?

However, there are absolutely Kafka clients in production use on MacOS. And likely even much more development. So assuming the AKI CA does not work in any clients on MacOS, I would say it is a blocker and the whole issue would be way more complicated.

AFAIU, the error that @o-afanasenko is facing on MacOS is about the process of generating the self-signed CA certificate with AKI. So it looks the Kube cluster is running somewhere on MacOS and the operator gets the error when generating the cert. It's not about clients validating. Also curious "there are absolutely Kafka clients in production use on MacOS" ... from where you get a statement like this? I can see clients on MacOS during development instead.

ppatierno avatar May 07 '25 13:05 ppatierno

I had an empty AKI on MacOS for CA cert generation and a test failed but I don't think this is a blocker, because in before() there is a stopper for all SSL tests on MacOS (I commented out it for local development)

Assumptions.assumeTrue(System.getProperty("os.name").contains("nux"));

Certificate generation is executed on Linux in k8s and not connected with Mac OS or any other clients

I am trying to add AKI to CA certificate which is redundant IMHO. I just check tests results in CI/CD instead of local debugging.

When you have a discussion about this problem I think the main question is to decide where AKI should be added: CA certificates, client certificates or broker certificates only (for broker certs I already finished and it is required to close this issue)

o-afanasenko avatar May 07 '25 13:05 o-afanasenko

I would guess that if it does not work on MacOS it won't work on Linux

Not a MacOS user, why this assumption?

I would expect OpenSSL to do the same on all operating systems it supports.

scholzj avatar May 07 '25 21:05 scholzj

On the topic of cert-manager, there doesn't seem to be any support currently, but I've asked some of the project members whether they think it's something that might get added in future. I'll report back if an issue for it gets raised.

katheris avatar May 12 '25 14:05 katheris

Triaged on 29.5.2025: @o-afanasenko are you still doing some kind of investigation around this? @katheris did you get some info from the cert-manager community?

im-konge avatar May 29 '25 08:05 im-konge

This is already supported (i.e. for x509 Certs issued by in-tree cert-manager Issuer(s): Self-Signed, CA; for other Issuers results may vary because cert-manager-controller itself does not generate the certs in those cases) in upstream [atleast] cert-manager 1.15+.

Quick example to reproduce the field presence on py3.13:

$ oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  name: sandbox
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-selfsigned-ca
  namespace: sandbox
spec:
  isCA: true
  commonName: my-selfsigned-ca
  secretName: root-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: my-ca-issuer
  namespace: sandbox
spec:
  ca:
    secretName: root-secret
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: child-cert
  namespace: sandbox
spec:
  isCA: true
  commonName: service-a.default.svc
  secretName: service-a-cert
  issuerRef:
    name: my-ca-issuer
    kind: Issuer
    group: cert-manager.io

$ oc get secret -n sandbox service-a-cert -o json | jq '.data["tls.crt"]' -r | base64 -d > service-a-cert.pem

$ pyenv local 3.13
$ python3
Python 3.13.2 (main, Jun  3 2025, 00:41:39) [Clang 17.0.0 (clang-1700.0.13.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> ssl._ssl._test_decode_cert("service-a-cert.pem")
{'subject': ((('commonName', 'service-a.default.svc'),),), 'issuer': ((('commonName', 'my-selfsigned-ca'),),), 'version': 3, 'serialNumber': '8659D93461EFC03235B35D7F57C8120A', 'notBefore': 'Jun  2 19:07:08 2025 GMT', 'notAfter': 'Aug 31 19:07:08 2025 GMT'}

>>> from cryptography import x509
>>> from cryptography.hazmat.backends import default_backend
>>> with open("service-a-cert.pem", "rb") as cert_file:
...     cert_data = cert_file.read()
...     
>>> cert = x509.load_pem_x509_certificate(cert_data, default_backend())
>>> aki = cert.extensions.get_extension_for_class(x509.AuthorityKeyIdentifier).value
>>> aki
<AuthorityKeyIdentifier(key_identifier=b'9\xfe\xc9t\xcf\xb4\x8fL\xf2\xd4\xdb\x97\xcf\x1e\xe2\xf5\x9c\xcf\x86\x97', authority_cert_issuer=None, authority_cert_serial_number=None)>

swghosh avatar Jun 02 '25 20:06 swghosh

Yes as @swghosh stated, it looks like it's already supported in cert-manager, so no concerns there. Thanks @swghosh

katheris avatar Jun 04 '25 10:06 katheris

@katheris I tried to add AKI for CA certificates but it is not possible to do this easily (I can add either a lot of code just to support only AKI for CA or add AKI to all certificates). So for now I think this PR is enough just to fix the basic mentioned problem. What do you mean by "no concerns"? Is my PR ready to merge or to decline?

o-afanasenko avatar Jun 04 '25 13:06 o-afanasenko

What do you mean by "no concerns"? Is my PR ready to merge or to decline?

@o-afanasenko my comment was in reference to whether this would impact the work I am doing to integrate cert-manager with Strimzi. The comments about whether your PR is ready will be added to the pull request directly.

katheris avatar Jun 05 '25 08:06 katheris

Triaged on 26.6.2025: We should keep it opened and @katheris could you please have a look on the PR from @o-afanasenko ? Thanks!

im-konge avatar Jun 26 '25 08:06 im-konge

Triaged on 10.7.2025: we agreed that this issue would need a proposal for better discussion if having AKI is something needed just for broker certificates or it's better to cover all the certificates in the cluster, so including the CA and user ones. It could potentially need a feature-gate to enabling the AKI and we should discuss if the rolling out would happen on the upgrade or on the next generation of certificates. @o-afanasenko are you still interested in working on it and the proposal?

ppatierno avatar Jul 10 '25 16:07 ppatierno

@ppatierno Yes, I am interested in working on it. @katheris About proposal I am not sure how to do it. Could you give me a link with examples please?

o-afanasenko avatar Jul 29 '25 09:07 o-afanasenko

@o-afanasenko you can find all the Strimzi proposals in this repo https://github.com/strimzi/proposals You can start writing one from the template or anyway looking at how the others are structured.

ppatierno avatar Jul 29 '25 09:07 ppatierno