kroxylicious Extend the automatic discovery of bootstrap address feature to handle tls listeners

📖 Summary Extend the automatic discovery of bootstrap address feature to handle tls listeners . This will reduce manual configuration, create a tighter, more declarative integration with the Strimzi ecosystem.

🤔 Current Behavior (as at main) Currently, with the integration of automatic discovery of bootstrap address of a strimzi kafka cluster(#2693) feature, we are able to retrieve the bootstrap address of the plain listeners by using strimziKafkaRef section in the KafkaService. But currently it doesn't support tls and when dealing with tls protected Strimzi Kafka cluster, the user will still need to use the old approach of manually setting the bootstrap address and then using the trustAnchorRef field to configure trust for the kafka cluster

Example KafkaService CR with bootstrapServerAddress:

apiVersion: kroxylicious.io/v1alpha1
metadata:
  name: my-cluster
  namespace: my-proxy
spec:
  bootstrapServers: my-cluster-kafka-bootstrap.kafka.svc.cluster.local:9093
  nodeIdRanges:
    - start: 0
      end: 2
  tls:
    trustAnchorRef:
      name: my-cluster-clients-ca-cert
      kind: ConfigMap
      key: ca.pem

This approach is manual, prone to typos, and requires the user to know the exact service address. If the Strimzi-managed cluster changes its listener configuration, this KafkaService resource must be updated manually.

✨ Proposed Solution We propose extending the new integrated feature to work with tls. If a user uses the strimziKafkaRef field in their Kafka CR then the Kroxylicious operator should be able to proxy it without needing to explictly configure trust.

Example of the KafkaService CR:

apiVersion: kroxylicious.io/v1alpha1
kind: KafkaService
metadata:
  name: my-cluster
  namespace: my-proxy
spec:
  ref:
    group: kafka.strimzi.io
    kind: Kafka
    name: my-cluster         # Name of the Kafka CR
    namespace: kafka        # Namespace of the Kafka CR
    listenerName: tls       # The specific listener to use from the Kafka CR status

The operator would look at the specified Strimzi Kafka resource, find the listener named tls in its status, and extract the bootstrapServers from there and add it to to bootstrapServers field in the kafkaServiceStatus

The operator will then look at the at the cluster-ca-cert secret that is generated by Strimzi and the ca.crt entry within that secret.

The status of the KafkaService should reflect the cert name and kind and key of the Strimzi Kafka Cluster

kind: KafkaService
apiVersion: kroxylicious.io/v1alpha1
metadata:
  name: fooref
  namespace: proxy-ns
  generation: 6
spec:
  bootstrapServers: second-kafka.kafka2.svc.cluster.local:9092
status:
   bootstrapServers: my-cluster-kafka-bootstrap.kafka.svc.cluster.local:9093
   trustAnchorRef
      name: <name of the cert>
      kind: <Kind of the cert>
      key: <Key of the cert>

Oct 24 '25 11:10 ShubhamRwt

Current Behavior (as at v0.13.0.)

You are describing behaviour on main. 0.13.0 was released several months ago and has none of this, of course.

The operator will then look at the at the cluster-ca-cert secret that is generated by Strimzi and extract the ca.crt entry within that secret.

"extract" is the wrong word. We are relying Strimzi's public API but we still need to validate things are as we expect.

https://strimzi.io/docs/operators/latest/deploying#cluster_ca_secrets

"Table 20. Fields in the <cluster_name>-cluster-ca-cert secret" says there will be a key called ca.crt. The status section just needs to reference that.

I think our Operator should check that the expected secret exists. If it doesn't probably use a ReferencedResourceNotReconciled. Check that the expected key exists. If it doesn't use probably use a InvalidReferencedResource

Finally, I think if the user specifies a spec.tls.trustAnchorRef, we should us that in preference to the one provided by Strimzi. Our own configuration should trump.

Oct 24 '25 12:10 k-wall

One thing troubles me. If the Strimzi user opts for a certificate signed by real CA (Verisign etc), then we (as a client) don't need a trust anchor. We should use system trust. Taking the trust anchor from Strimzi in this situation is harmful. It is like me sniffing the certificate from google.com, installing it in the browser, then connecting believing my connection with google.com is secure. It isn't -my chain of trust relies on something presented by the peer. I might be talking to an imposter.

I think the issue here is actually with Strimzi. How does a client know when it needs to use the cluster-ca-cert to connect and when it doesn't? Does it just boil down to a priori knowledge of how certificates are setup in Strimzi?

What am I missing?

@katheris WDYT?

Oct 24 '25 13:10 k-wall

I was reading through the certificate related notes and docs today and I think to determine when to use internal cluster-ca-cert or not, the operator checks for brokerCertChainAndKey field in the Kafka listener configuration in the Kafka CR. The brokerCertChainAndKey provides the ability to explicitly define the secret, crt, key that we want the operator to use.

https://strimzi.io/docs/operators/latest/configuring.html#property-listener-config-brokerCertChainAndKey-reference

Oct 27 '25 08:10 ShubhamRwt

So there are several scenarios to deal with TLS in Strimzi in terms of CA certificates, just for the encryption part of the things and taking out the mTLS:

When the user leverages Strimzi to generate the CA certificate (which is the default behaviour), you can find such CA certificate within the <cluster_name>-cluster-ca-cert Secret but also within the Kafka custom resource status (in the listeners section).
When the user provides their own CA certificate, they are going to provide the <cluster_name>-cluster-ca-cert Secret themselves so from a client perspective not that much different from the first use case.
Additionally, within the Kafka listeners in the spec, the user can also decides to use a specific certificate for a specific listener, and this is where the brokerCertChainAndKey comes into the picture to be configured within the listener itself. Of course, it specifies the certificate (and corresponding key) to be served by the listener but doesn't have any information about how that certificate was signed (by which CA). This is the case when such certificate was signed with a well-known external CA like it could be Let's Encrypt. In this case, the client doesn't have to get any CA cert because, i.e. from a Java perspective, it's well known and already loaded in the system (i.e. Java keystore already contains such CAs). There is a Strimzi blog post showing this https://strimzi.io/blog/2021/05/07/deploying-kafka-with-lets-encrypt-certificates/

How does a client know when it needs to use the cluster-ca-cert to connect and when it doesn't? Does it just boil down to a priori knowledge of how certificates are setup in Strimzi?

Mostly yes. In the third scenario, Strimzi is totally unaware of how the certificate on the listener was signed and it doesn't have any clue about the CA that was used.

Oct 27 '25 13:10 ppatierno

Regardless of whether Strimzi is issuing certificates using it's own CA, or whether the person deploying the Kafka cluster provided certificates for the Kafka listeners, I agree with Keith that from a security perspective the most secure option is for Korxylicious to be configured with it's own trust anchor, rather than blindly taking one from a Secret owned by Strimzi.

However, I do think there are circumstances where using the trust anchor in the <cluster_name>-cluster-ca-cert Secret is valid. The most obvious is in development environments, but even in production if Kroxylicious and Strimzi and deployed in the same Kubernetes cluster then it might be deemed acceptable to use the same Secret for both operators. That Secret may be managed by the owner of the cluster, rather than being created by Strimzi. So my feeling is you should support both options.

Just to throw another option in, what about the Strimzi Access Operator? Rather than having Kroxylicious inspecting all the different Secrets and CRs for the credentials, should you be able to reference a KafkaAccess CR and then have Kroxylicious take the credentials from the resulting Secret? The idea of the Access Operator was to provide an easier way for Kubernetes based applications to get all the necessary credentials to connect to a particular listener in a Strimzi Kafka cluster, so it seems appropriate to use that here. The cons are that the user has to deploy another operator and CR, but if the configuration of the listener changes then Access Operator automatically updates the Secret fields, so it would presumably mean less overhead for the Kroxylicious operator in terms of watching Kubernetes objects.

Oct 27 '25 18:10 katheris

Regardless of whether Strimzi is issuing certificates using it's own CA, or whether the person deploying the Kafka cluster provided certificates for the Kafka listeners, I agree with Keith that from a security perspective the most secure option is for Korxylicious to be configured with it's own trust anchor, rather than blindly taking one from a Secret owned by Strimzi.

However, I do think there are circumstances where using the trust anchor in the <cluster_name>-cluster-ca-cert Secret is valid.

Agreed on both points. I'm thinking that the Kroxylicious feature to trust <cluster_name>-cluster-ca-cert should be controlled by a flag. Following the secure-by-default mantra, the flag needs to be off by default.

trustStrimziCaCertifcate: true|false   # false is the default

Just to throw another option in, what about the Strimzi Access Operator? Rather than having Kroxylicious inspecting all the different Secrets and CRs for the credentials, should you be able to reference a KafkaAccess CR and then have Kroxylicious take the credentials from the resulting Secret?

I think this is something we'll definitely want eventually. Currently we are focusing on SASL pass-through use-cases where the proxy does not authenticate to the broker in its own right. I think once we get to SASL initiation use-cases, the Strimzi Access Operator becomes a natural integration point. I've not thought about this too deeply, so I could be off the mark. Am I missing something?

Oct 28 '25 09:10 k-wall

@ShubhamRwt @ppatierno so in the case where brokerCertChainAndKey is used, does Strimzi still make a <cluster_name>-cluster-ca-cert secret?

Oct 28 '25 09:10 k-wall

@k-wall I think to achieve that, when we are using our own CA certificates and key, then we need to use

kind: Kafka
version: kafka.strimzi.io/v1beta2
spec:
  # ...
  clusterCa:
    generateCertificateAuthority: false

in our Kafka CR, We can do the same for client CA also

  clientsCa:
    generateCertificateAuthority: false

Oct 28 '25 09:10 ShubhamRwt

@k-wall when you don't disable cluster CA auto-generation on the Kafka custom resource to bring your own, Strimzi will always generate the Secret. But even when disabling cluster CA auto-generation, it will be brought by the user ... so yes that Secret is always in place given the nature of Strimzi to set up the TLS connections internally between all nodes (cannot be disabled), even if you don't use TLS encryption with clients.

Then the brokerCertChainAndKey field on a specific listener just says to use a different certificate for that listener which was not signed by the cluster CA (of course you could even use such CA to sign it but it would not make much sense because it is the Strimzi default for TLS enabled listeners).

Oct 28 '25 10:10 ppatierno

@k-wall I think to achieve that, when we are using our own CA certificates and key, then we need to use (snip)

If I am understanding things correctly, we don't need to consider generateCertificateAuthority at all.

I think kroxylicious operator should:

if trustStrimziCaCertifcate is true
   if brokerCertChainAndKey set on listener
     use ref to the cert from brokerCertChainAndKey
   else
       use ref to the cert from <cluster_name>-cluster-ca-cert
   end if
end if

Oct 29 '25 20:10 k-wall

Yes @k-wall, we should follow this way. I left my idea on the Kroxylicious slack channel on how I was planning to implement it. The logic I have started working on would be similar to the snippet you shared above

Oct 29 '25 20:10 ShubhamRwt

I left my idea on the Kroxylicious slack channel

Probably best to keep the ideas in one place.

I think a design proposal is in order. The template is here.

https://github.com/kroxylicious/design/blob/main/proposals/000-template.md

You don't need to wait the approval of the design proposal to start working on the code.

Oct 30 '25 16:10 k-wall

Thanks for sharing Keith. I will start with the proposal as I progress with my POC. Playing with my POC would allow me to find maybe something what is still undiscovered and we can then discuss it on the proposal

Oct 30 '25 16:10 ShubhamRwt

It is worth writing down your ideas first, so that others can comment / validate them. It doesn't need to be polished. The proposal surfaces the public API which is usually where there's the most debate. If the POC causes a rethink of the proposal, that's fine.

Oct 30 '25 16:10 k-wall