helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

🔹🐛 Operator is unable to check for cluster readiness over an Admin API listener with TLS enabled.

Open c4milo opened this issue 10 months ago • 9 comments

What happened?

In config-watcher container:

│ Waiting for cluster to be ready                                                                                                │
│ unable to request cluster health: Get "http://redpanda-1.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster/health_overview" │
│ Error: (1) occurred at line 12

What did you expect to happen?

rpk cluster health --watch --exit-when-healthy -X admin.tls.enabled=true -X admin.tls.insecure_skip_verify=true

If I can pass admin.tls.insecure_skip_verify also through the CR, it will be fabulous.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

COMPUTED VALUES:
adminAPIListeners:
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 9644
  auth_method: none
  enabled: true
  name: admin-api.internal
  port: 9644
  tls_cert: letsencrypt
  tls_enabled: true
  tls_require_client_auth: false
  tls_truststore: ""
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 30644
  auth_method: sasl
  enabled: false
  name: admin-api
  port: 30644
  tls_cert: letsencrypt
  tls_enabled: true
  tls_require_client_auth: false
  tls_truststore: ""
authSASLEnabled: false
authSASLMechanism: SCRAM-SHA-512
authSASLSecretRef: redpanda/redpanda-superusers
baseDNSName: camilo.panda.dev
brokerMemorySizeMiB: 8192
brokerVCPUCount: 3
clusterConfig:
  cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
  cloud_storage_azure_storage_account: testcamilo9
  cloud_storage_credentials_source: azure_aks_oidc_federation
  cloud_storage_enable_remote_read: "true"
  cloud_storage_enable_remote_write: "true"
  cloud_storage_enabled: "false"
  default_topic_replications: "3"
  minimum_topic_replications: "3"
cmdline:
- --abort-on-seastar-bad-alloc
- --dump-memory-diagnostics-on-alloc-failure-kind=all
containerImage:
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: v23.3.7
httpProxyListeners:
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 8082
  auth_method: http_basic
  enabled: true
  name: http-proxy.internal
  port: 8082
  tls_cert: letsencrypt
  tls_enabled: true
  tls_require_client_auth: false
  tls_truststore: ""
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 31082
  auth_method: http_basic
  enabled: true
  name: http-proxy
  port: 31082
  tls_cert: letsencrypt
  tls_enabled: true
  tls_require_client_auth: false
  tls_truststore: ""
internalClusterDomain: cluster.local
kafkaAPIListeners:
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 9092
  auth_method: sasl
  enabled: true
  name: kafka-api.internal
  port: 9092
  tls_cert: letsencrypt
  tls_require_client_auth: false
  tls_truststore: ""
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 32092
  auth_method: sasl
  enabled: true
  name: kafka-api
  port: 32092
  tls_cert: letsencrypt
  tls_require_client_auth: false
  tls_truststore: ""
licenseSecretRef:
  key: license
  name: redpanda-9m4e2mr0ui3e8a215n4g-license
logLevel: debug
nodeConfig: {}
nodeCount: 3
nodeSelector:
  cloud.redpanda.com/role: redpanda
operatorEnabled: true
operatorForceHelmUpdate: 1712029669
podLabels:
  azure.workload.identity/use: "true"
rackAwareness:
  annotation: topology.kubernetes.io/zone
  enabled: true
redpandaClusterID: 9m4e2mr0ui3e8a215n4g
rpcListeners:
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 33145
  auth_method: none
  enabled: true
  name: rpc.internal
  port: 33145
  tls_cert: letsencrypt
  tls_require_client_auth: false
  tls_truststore: ""
schemaRegistryListeners:
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 8081
  auth_method: http_basic
  enabled: true
  name: schema-registry.internal
  port: 8081
  tls_cert: letsencrypt
  tls_require_client_auth: false
  tls_truststore: ""
- address: 0.0.0.0
  advertised_address: 127.0.0.1
  advertised_port: 31081
  auth_method: http_basic
  enabled: true
  name: schema-registry
  port: 31081
  tls_cert: letsencrypt
  tls_require_client_auth: false
  tls_truststore: ""
serviceAccount:
  annotations:
    azure.workload.identity/client-id: c90db393-857d-41d0-ac0d-0e61271fcaa6
  create: true
  labels:
    app.kubernetes.io/managed-by: redpanda
  name: id-rpcloud-9m4e2mr0ui3e8a215n4
storageClass: local-path
storageSizeGiB: 4096
tlsCertificates:
- ca_enabled: false
  duration: 43800h
  issuer_kind: ClusterIssuer
  issuer_ref: letsencrypt-dns-prod
  name: letsencrypt
tlsEnabled: true
tolerations:
- effect: NoSchedule
  key: cloud.redpanda.com/role
  operator: Equal
  value: redpanda
tunableConfig: {}

Anything else we need to know?

Name:         redpanda
Namespace:    redpanda
Labels:       app.kubernetes.io/component=redpanda
              app.kubernetes.io/instance=redpanda
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redpanda
              helm.sh/chart=redpanda-5.7.36
              helm.toolkit.fluxcd.io/name=redpanda
              helm.toolkit.fluxcd.io/namespace=redpanda
Annotations:  meta.helm.sh/release-name: redpanda
              meta.helm.sh/release-namespace: redpanda

Data
====
bootstrap.yaml:
----
kafka_enable_authorization: false
enable_sasl: false
enable_rack_awareness: true
cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
cloud_storage_azure_storage_account: testcamilo9
cloud_storage_credentials_source: azure_aks_oidc_federation
cloud_storage_enable_remote_read: "true"
cloud_storage_enable_remote_write: "true"
cloud_storage_enabled: "false"
      
default_topic_replications: 3
minimum_topic_replications: "3"

compacted_log_segment_size: 67108864
group_topic_partitions: 16
kafka_batch_max_bytes: 1048576
kafka_connection_rate_limit: 1000
log_segment_size: 134217728
log_segment_size_max: 268435456
log_segment_size_min: 16777216
max_compacted_log_segment_size: 536870912
topic_partitions_per_shard: 1000
storage_min_free_bytes: 5368709120

audit_enabled: false

redpanda.yaml:
----
config_file: /etc/redpanda/redpanda.yaml
cluster_id: 9m4e2mr0ui3e8a215n4g
redpanda:
  empty_seed_starts_cluster: false
  kafka_enable_authorization: false
  enable_sasl: false
  cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
  cloud_storage_azure_storage_account: testcamilo9
  cloud_storage_credentials_source: azure_aks_oidc_federation
  cloud_storage_enable_remote_read: "true"
  cloud_storage_enable_remote_write: "true"
  cloud_storage_enabled: "false"
  default_topic_replications: "3"
  minimum_topic_replications: "3"
  compacted_log_segment_size: 67108864
  group_topic_partitions: 16
  kafka_batch_max_bytes: 1048576
  kafka_connection_rate_limit: 1000
  log_segment_size: 134217728
  log_segment_size_max: 268435456
  log_segment_size_min: 16777216
  max_compacted_log_segment_size: 536870912
  topic_partitions_per_shard: 1000
  storage_min_free_bytes: 5368709120
    
  crash_loop_limit: "5"
  audit_enabled: false


  admin:
    - name: internal
      address: 0.0.0.0
      port: 9644
    - name: default
      address: 0.0.0.0
      port: 9645
  admin_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
  kafka_api:
    - name: internal
      address: 0.0.0.0
      port: 9092
      authentication_method: sasl
    - name: default
      address: 0.0.0.0
      port: 9094
    - name: kafka-api
      address: 0.0.0.0
      port: 32092
      authentication_method: sasl
  kafka_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: kafka-api
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      
      truststore_file: /etc/ssl/certs/ca-certificates.crt
  rpc_server:
    address: 0.0.0.0
    port: 33145
  rpc_server_tls:
    enabled: true
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    require_client_auth: false
    truststore_file: /etc/ssl/certs/ca-certificates.crt
  seed_servers: 
    - host:
        address: redpanda-0.redpanda.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-1.redpanda.redpanda.svc.cluster.local.
        port: 33145
    - host:
        address: redpanda-2.redpanda.redpanda.svc.cluster.local.
        port: 33145

schema_registry_client:
  brokers:
  - address: redpanda-0.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-1.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-2.redpanda.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    truststore_file: /etc/ssl/certs/ca-certificates.crt
schema_registry:
  schema_registry_api:
    - name: internal
      address: 0.0.0.0
      port: 8081
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8084
    - name: schema-registry
      address: 0.0.0.0
      port: 31081
      authentication_method: http_basic
  schema_registry_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: schema-registry
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt

pandaproxy_client:
  brokers:
  - address: redpanda-0.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-1.redpanda.redpanda.svc.cluster.local.
    port: 9092
  - address: redpanda-2.redpanda.redpanda.svc.cluster.local.
    port: 9092
  broker_tls:
    enabled: true
    require_client_auth: false
    cert_file: /etc/tls/certs/letsencrypt/tls.crt
    key_file: /etc/tls/certs/letsencrypt/tls.key
    truststore_file: /etc/ssl/certs/ca-certificates.crt
pandaproxy:
  pandaproxy_api:
    - name: internal
      address: 0.0.0.0
      port: 8082
      authentication_method: http_basic
    - name: default
      address: 0.0.0.0
      port: 8083
    - name: http-proxy
      address: 0.0.0.0
      port: 31082
      authentication_method: http_basic
  pandaproxy_api_tls:
    - name: internal
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt
    - name: default
      enabled: true
      cert_file: /etc/tls/certs/external/tls.crt
      key_file: /etc/tls/certs/external/tls.key
      require_client_auth: false
      truststore_file: /etc/tls/certs/external/ca.crt
    - name: http-proxy
      enabled: true
      cert_file: /etc/tls/certs/letsencrypt/tls.crt
      key_file: /etc/tls/certs/letsencrypt/tls.key
      require_client_auth: false
      truststore_file: /etc/ssl/certs/ca-certificates.crt


rpk:
  # redpanda server configuration
  overprovisioned: false
  enable_memory_locking: false
  additional_start_flags:
    - "--smp=3"
    - "--memory=5734M"
    - "--reserve-memory=214M"
    - "--default-log-level=debug"
    - --abort-on-seastar-bad-alloc
    - --dump-memory-diagnostics-on-alloc-failure-kind=all
  # rpk tune entries
  tune_aio_events: true

  # kafka connection configuration
  kafka_api:
    brokers: 
      - redpanda-0.redpanda.redpanda.svc.cluster.local.:9092
      - redpanda-1.redpanda.redpanda.svc.cluster.local.:9092
      - redpanda-2.redpanda.redpanda.svc.cluster.local.:9092
    tls:
  admin_api:
    addresses: 
      - redpanda-0.redpanda.redpanda.svc.cluster.local.:9644
      - redpanda-1.redpanda.redpanda.svc.cluster.local.:9644
      - redpanda-2.redpanda.redpanda.svc.cluster.local.:9644
    tls:


BinaryData
====

Events:  <none>

Which are the affected charts?

Redpanda, Operator

Chart Version(s)

❯ helm -n redpanda list
NAME             	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART          	APP VERSION
redpanda         	redpanda 	6       	2024-04-02 01:17:31.253662 -0400 EDT	deployed	redpanda-0.1.1 	0.1.0
redpanda-operator	redpanda 	1       	2024-04-01 18:14:29.732128 -0400 EDT	deployed	operator-0.4.20	v2.1.15-23.3.7

Cloud provider

Azure / AKS

JIRA Link: K8S-131

c4milo avatar Apr 02 '24 05:04 c4milo

I also opened https://github.com/redpanda-data/redpanda/issues/17540

c4milo avatar Apr 02 '24 06:04 c4milo

@c4milo i think i understand this problem now, do you have a simple sample to test this out? otherwise ill assume just enabling config watcher and tls should be enough right?

alejandroEsc avatar Apr 04 '24 20:04 alejandroEsc

I am not sure what I am missing here, the config-watcher does a good job of getting cluster-health given the setup of rpk when tls is enabled, done some time ago. Now what I am surprised about is this:

redpanda redpanda 6 2024-04-02 01:17:31.253662 -0400 EDT deployed redpanda-0.1.1 0.1.0

we are currently at, looking at the repo:

redpanda/redpanda  	5.7.37       	v23.3.10      	Redpanda is the real-time engine for modern apps.

and my local installation:

redpanda    	redpanda    	1       	2024-04-05 07:33:28.463959 -0400 EDT	deployed	redpanda-5.7.37     	v23.3.10

Clearly something is off, can we verify this using the latest charts please? And if we cannot achieve that, is the expectation that we back-port something?

alejandroEsc avatar Apr 05 '24 11:04 alejandroEsc

Does the config map for rpk look like this in your setup? Screenshot 2024-04-05 at 5 11 59 PM

c4milo avatar Apr 05 '24 21:04 c4milo

why is it trying to use the external domain instead of the internal?

c4milo avatar Apr 05 '24 21:04 c4milo

@alejandroEsc, please let me know if you want to pair on this one.

c4milo avatar Apr 10 '24 13:04 c4milo

@alejandroEsc, please let me know if you want to pair on this one.

yeah, let me know. With the latest changes I am hoping this is resolved?

alejandroEsc avatar Apr 12 '24 17:04 alejandroEsc

This issue is probably a symptom of internal certs using the public dns domain, if we fix that it should also fix this.

Camilo Aguilar

Software Engineer

redpanda.com | The streaming data platform for developers

Follow us on Twitter https://twitter.com/redpandadata | Join our community https://join.slack.com/t/redpandacommunity/shared_invite/zt-ng2ze1uv-l5VMWSGQHB9gp47~kNnYGA/

On Fri, Apr 12, 2024 at 1:59 PM Alejandro Escobar @.***> wrote:

@alejandroEsc https://github.com/alejandroEsc, please let me know if you want to pair on this one.

yeah, let me know. With the latest changes I am hoping this is resolved?

— Reply to this email directly, view it on GitHub https://github.com/redpanda-data/helm-charts/issues/1127#issuecomment-2052222632, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKFUIO4GDQN77EHHZU6FDY5AOHTAVCNFSM6AAAAABFSW7EN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJSGIZDENRTGI . You are receiving this because you were mentioned.Message ID: @.***>

c4milo avatar Apr 14 '24 06:04 c4milo

I haven't tested again but I think this issue may have been fixed by https://github.com/redpanda-data/helm-charts/issues/1155 as well.

c4milo avatar Apr 16 '24 13:04 c4milo

We have fixed this issue by configuring RPK correctly. It's done in config map where Redpanda.yaml is located.

RafalKorepta avatar Aug 02 '24 08:08 RafalKorepta