helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Setup cluster with TLS using a K8s secret doesn't work

Open itamararjuan opened this issue 1 year ago • 4 comments

What happened?

My network is internal in my EKS cluster and so I cannot use cert-manager to create a Let's Encrypt certificate. So I have used certbot to create the certificates to my brokers with a DNS challenge which I completed manually.

I then went ahead and inserted the certificate files into a K8s secret just as the documentation asks to do. I went ahead and fired up a cluster referrencing the secret in the external listeners section

What did you expect to happen?

I expected the cluster to consume the secret content and be able to configure the TLS of the brokers correctly. but instead, the cluster didn't come up and the logs from the pods said that there is no such file for the ca.crt file even though I've made a certificate using Let's Encrypt, therefore a public CA not needing to specify a ca.crt file

I also tried to mitigate this by trying to create a generic secret and putting the supplied chain (CA.crt) of Let's encrypt which was provided with the certificate by certbot and tried referncing also that generic secret but again, the cluster won't come up.

Esentially I'm not able to setup a TLS enabled cluster in any other mode other than using a self signed certificate by cert-manager.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef:
    chartVersion: 5.9.6
  clusterSpec:
    external:
      enabled: true
      domain: redpanda.example.com
      type: NodePort
    tls:
      enabled: true
      certs:
        external:
          secretRef:
            name: rp-data-tls

    auth:
      sasl:
        enabled: true
        users:
          - name: superuser
            password: secretpassword
    storage:
      persistentVolume:
        enabled: true
        storageClass: gp2 // just an example

Anything else we need to know?

No response

Which are the affected charts?

No response

Chart Version(s)

5.9.6

Cloud provider

AWS - EKS

JIRA Link: K8S-394

itamararjuan avatar Oct 20 '24 15:10 itamararjuan

Could you share the logs from redpanda when it's refusing to start up and the redacted contents of your secret, really just the keys would be fine?

chrisseto avatar Oct 22 '24 15:10 chrisseto

Just saw now that you replied - of course!

Let's start by the secret , so I'm creating a TLS type secret using the method supplied in the documentation like so:

kubectl create secret tls rp-data-tls \
  --cert=certs/tls.crt \
  --key=certs/tls.key \
  --namespace redpanda-data

The files are the actual certificate which I've made with certbot - I copied those files and renamed them from the following available files that I got when using certbot:

I renamed the cert.pem to tls.crt and privkey.pem to tls.key and just referenced those files here

The logs of the nodes when they try to come up complain about not having the ca.crt availble (As if red panda is expecting it) even though this certificate was created with a public CA (let's encrypt using certbot cli for macOS)

The private key redacted content is:

-----BEGIN PRIVATE KEY-----
M
[...REDACTED]
88
-----END PRIVATE KEY-----

and the tls.crt which I've used looks something like this:

-----BEGIN CERTIFICATE-----
MIIEHTCCA6KgAwIBAgISA5tlmfIeVm/VqHhsg/lLFB+pMAoGCCqGS
CzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0
[...REDACTED]
th4iVJfjNpZlf7U1VWEoBhYwqU4Mb7rolmPEN4gptQcBSOgqEOh18
ALBwQumX89lBh5dGtNGgCbgBPNVd6HMyqSKF7ob4IBnbbg7UR
Pg==
-----END CERTIFICATE-----

itamararjuan avatar Nov 10 '24 11:11 itamararjuan

My yaml for creating the cluster is the following:

apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata:
  name: redpanda
spec:
  chartRef:
    chartVersion: 5.9.9
  clusterSpec:
    nodeSelector:
      purpose: "red-panda-cluster"
    external:
      enabled: true
      domain: redpanda-data.k8s.staging.mycompanydomain.com
      type: NodePort
    tls:
      enabled: true
      certs:
        external:
          secretRef:
            name: rp-data-tls

    auth:
      sasl:
        enabled: true
        users:
          - name: superuser
            password: secretpassword
    storage:
      persistentVolume:
        enabled: true
        storageClass: csi-driver-ebs-gp3

itamararjuan avatar Nov 10 '24 11:11 itamararjuan

INFO  2024-11-10 12:46:40,982 [shard 0:main] cluster - Using index based node ID {2}
INFO  2024-11-10 12:46:40,983 [shard 0:main] main - application.cc:2611 - Starting Redpanda with node_id 2, cluster UUID {nullopt}
INFO  2024-11-10 12:46:40,983 [shard 0:main] raft - coordinated_recovery_throttle.cc:126 - Starting recovery throttle, rate: 104857600
INFO  2024-11-10 12:46:40,983 [shard 0:main] cluster - producer_state_manager.cc:45 - Started producer state manager
INFO  2024-11-10 12:46:40,983 [shard 0:main] main - application.cc:1583 - Partition manager started
INFO  2024-11-10 12:46:40,983 [shard 0:main] security - authorizer.h:273 - Registered superuser account: type {user} name {kubernetes-controller}
INFO  2024-11-10 12:46:40,983 [shard 0:main] security - authorizer.h:273 - Registered superuser account: type {user} name {superuser}
INFO  2024-11-10 12:46:40,983 [shard 0:main] main - application.cc:1671 - Archiver service setup, cloud_storage_enabled: false, legacy_upload_mode_enabled: true
INFO  2024-11-10 12:46:40,983 [shard 0:main] resource_mgmt - storage.cc:182 - Setting new target log data size 11.711GiB. Disk size 19.518GiB reservation percent 25 target percent {80} bytes {nullopt}
INFO  2024-11-10 12:46:40,987 [shard 0:main] rpc - server.cc:284 - vectorized internal rpc protocol - Stopping 1 listeners
INFO  2024-11-10 12:46:40,987 [shard 0:main] rpc - server.cc:296 - vectorized internal rpc protocol - Shutting down 0 connections
INFO  2024-11-10 12:46:40,987 [shard 0:main] resource_mgmt - storage.cc:88 - Stopping disk space manager service
INFO  2024-11-10 12:46:40,987 [shard 0:main] auditing - audit_log_manager.cc:792 - Shutting down audit log manager
INFO  2024-11-10 12:46:40,987 [shard 0:main] auditing - audit_log_manager.cc:563 - stop() invoked on audit_sink
INFO  2024-11-10 12:46:40,987 [shard 0:main] auditing - audit_log_manager.cc:600 - Setting auditing enabled state to: false
INFO  2024-11-10 12:46:40,987 [shard 0:main] auditing - audit_log_manager.cc:654 - Ignored update to audit_enabled(), auditing is already disabled
INFO  2024-11-10 12:46:40,988 [shard 0:main] raft - coordinated_recovery_throttle.cc:134 - Stopping recovery throttle
INFO  2024-11-10 12:46:40,988 [shard 0:main] kvstore - kvstore.cc:125 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/0_0
INFO  2024-11-10 12:46:41,022 [shard 0:main] main - application.cc:461 - Shutdown complete.
ERROR 2024-11-10 12:46:41,022 [shard 0:main] main - application.cc:487 - Failure during startup: std::__nested<std::runtime_error> (Could not read trust file /etc/tls/certs/external/ca.crt): std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: open failed: No such file or directory ["/etc/tls/certs/external/ca.crt"])

itamararjuan avatar Nov 10 '24 12:11 itamararjuan