helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Issues with Console and Connectors when mTLS is enabled for the Admin API

Open JakeSCahill opened this issue 2 years ago • 4 comments

What happened?

When enabling mTLS for the Admin API, Console and Connectors fail to start. Console reports that it's missing TLS certs:

 {"level":"info","ts":"2023-10-27T15:11:53.281Z","msg":"testing admin client connectivity","urls":["https://redpanda.redpanda.svc.cluster.local.:9644"]}
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
{"level":"fatal","ts":"2023-10-27T15:11:56.352Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers\": remote error: tls: certificate required"}

If I try to disable mTLS after enabling it, the post-upgrade job fails with Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded.

Post-upgrade logs:

Request error, trying another node: Get "https://redpanda-0.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
Request error, trying another node: Get "https://redpanda-1.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
unable to query config schema: Get "https://redpanda-2.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": dial tcp 10.244.2.3:9644: connect: connection refused

If I re-enable mTLS, Console starts running, but there are issues with Admin API connections.

https://github.com/redpanda-data/helm-charts/assets/45230295/62e3ea6f-cdb4-4eb0-93ca-8a12f9f0ddd7

What did you expect to happen?

Redpanda Console and Connectors should work even if mTLS is enabled.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Running in a kind cluster. I had a few overrides as I was testing a few things.

To enable mTLS with Connectors enabled:

export DOMAIN=customredpandadomain.local && \           
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set listeners.admin.tls.requireClientAuth=true --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers

To try to disable mTLS:

export DOMAIN=customredpandadomain.local && \               
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers
$ helm get values <redpanda-release-name> -n <redpanda-release-namespace> --all
COMPUTED VALUES:
affinity: {}
auth:
  sasl:
    enabled: true
    mechanism: SCRAM-SHA-512
    secretRef: redpanda-superusers
    users: []
clusterDomain: cluster.local
commonLabels: {}
config:
  cluster:
    default_topic_replications: 3
  node:
    crash_loop_limit: 5
  pandaproxy_client: {}
  rpk: {}
  schema_registry_client: {}
  tunable:
    compacted_log_segment_size: 67108864
    group_topic_partitions: 16
    kafka_batch_max_bytes: 1048576
    kafka_connection_rate_limit: 1000
    log_segment_size: 134217728
    log_segment_size_max: 268435456
    log_segment_size_min: 16777216
    max_compacted_log_segment_size: 536870912
    topic_partitions_per_shard: 1000
connectors:
  auth:
    sasl:
      enabled: false
      mechanism: scram-sha-512
      secretRef: ""
      userName: ""
  commonLabels: {}
  connectors:
    additionalConfiguration: ""
    bootstrapServers: ""
    brokerTLS:
      ca:
        secretNameOverwrite: ""
        secretRef: ""
      cert:
        secretNameOverwrite: ""
        secretRef: ""
      enabled: false
      key:
        secretNameOverwrite: ""
        secretRef: ""
    groupID: connectors-cluster
    producerBatchSize: 131072
    producerLingerMS: 1
    restPort: 8083
    schemaRegistryURL: ""
    secretManager:
      connectorsPrefix: ""
      consolePrefix: ""
      enabled: false
      region: ""
    storage:
      remote:
        read:
          config: false
          offset: false
          status: false
        write:
          config: false
          offset: false
          status: false
      replicationFactor:
        config: -1
        offset: -1
        status: -1
      topic:
        config: _internal_connectors_configs
        offset: _internal_connectors_offsets
        status: _internal_connectors_status
  container:
    javaGCLogEnabled: "false"
    resources:
      javaMaxHeapSize: 2G
      limits:
        cpu: 1
        memory: 2350Mi
      request:
        cpu: 1
        memory: 2350Mi
    securityContext:
      allowPrivilegeEscalation: false
  deployment:
    annotations: {}
    budget:
      maxUnavailable: 1
    create: false
    extraEnv: []
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    nodeAffinity: {}
    nodeSelector: {}
    podAffinity: {}
    podAntiAffinity:
      custom: {}
      topologyKey: kubernetes.io/hostname
      type: hard
      weight: 100
    priorityClassName: ""
    progressDeadlineSeconds: 600
    readinessProbe:
      failureThreshold: 2
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 3
      timeoutSeconds: 5
    restartPolicy: Always
    revisionHistoryLimit: 10
    schedulerName: ""
    securityContext:
      fsGroup: 101
      fsGroupChangePolicy: OnRootMismatch
      runAsUser: 101
    strategy:
      type: RollingUpdate
    terminationGracePeriodSeconds: 300
    tolerations: []
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
    updateStrategy:
      type: RollingUpdate
  enabled: true
  fullnameOverride: ""
  global: {}
  image:
    pullPolicy: IfNotPresent
    repository: docker.redpanda.com/redpandadata/connectors
    tag: ""
  imagePullSecrets: []
  logging:
    level: warn
  monitoring:
    annotations: {}
    enabled: false
    labels: {}
    namespaceSelector:
      any: true
    scrapeInterval: 30s
  nameOverride: test-name-2
  service:
    annotations: {}
    name: ""
    ports:
    - name: prometheus
      port: 9404
  serviceAccount:
    annotations: {}
    create: false
    name: ""
  storage:
    volume:
    - emptyDir:
        medium: Memory
        sizeLimit: 5Mi
      name: rp-connect-tmp
    volumeMounts:
    - mountPath: /tmp
      name: rp-connect-tmp
  test:
    create: false
  tolerations: []
console:
  affinity: {}
  annotations: {}
  autoscaling:
    enabled: false
    maxReplicas: 100
    minReplicas: 1
    targetCPUUtilizationPercentage: 80
  config: {}
  configmap:
    create: false
  console:
    config: {}
  deployment:
    create: false
  enabled: true
  enterprise:
    licenseSecretRef:
      key: ""
      name: ""
  extraContainers: []
  extraEnv: []
  extraEnvFrom: []
  extraVolumeMounts: []
  extraVolumes: []
  fullnameOverride: ""
  global: {}
  image:
    pullPolicy: IfNotPresent
    registry: docker.redpanda.com
    repository: redpandadata/console
    tag: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    className: ""
    enabled: false
    hosts:
    - host: chart-example.local
      paths:
      - path: /
        pathType: ImplementationSpecific
    tls: []
  initContainers:
    extraInitContainers: ""
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 0
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  podSecurityContext:
    fsGroup: 99
    runAsUser: 99
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enterprise: {}
    kafka: {}
    login:
      github: {}
      google: {}
      jwtSecret: ""
      oidc: {}
      okta: {}
    redpanda:
      adminApi: {}
  secretMounts: []
  securityContext:
    runAsNonRoot: true
  service:
    annotations: {}
    port: 8080
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
  topologySpreadConstraints: {}
enterprise:
  license: ""
  licenseSecretRef: {}
external:
  domain: customredpandadomain.local
  enabled: true
  service:
    enabled: true
  type: NodePort
fullnameOverride: ""
image:
  pullPolicy: IfNotPresent
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: ""
imagePullSecrets: []
license_key: ""
license_secret_ref: {}
listeners:
  admin:
    external:
      default:
        advertisedPorts:
        - 31644
        port: 9645
        tls:
          cert: external
    port: 9644
    tls:
      cert: default
      requireClientAuth: true
  http:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30082
        authenticationMethod: null
        port: 8083
        tls:
          cert: external
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8082
    tls:
      cert: default
      requireClientAuth: false
  kafka:
    authenticationMethod: null
    external:
      default:
        advertisedPorts:
        - 31092
        authenticationMethod: null
        port: 9094
        tls:
          cert: external
    port: 9093
    tls:
      cert: default
      requireClientAuth: false
  rpc:
    port: 33145
    tls:
      cert: default
      requireClientAuth: false
  schemaRegistry:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30081
        authenticationMethod: null
        port: 8084
        tls:
          cert: external
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8081
    tls:
      cert: default
      requireClientAuth: false
logging:
  logLevel: info
  usageStats:
    enabled: true
monitoring:
  enabled: false
  labels: {}
  scrapeInterval: 30s
  tlsConfig: {}
nameOverride: rp-test
nodeSelector: {}
post_install_job:
  affinity: {}
  enabled: true
post_upgrade_job:
  affinity: {}
  enabled: true
rackAwareness:
  enabled: false
  nodeAnnotation: topology.kubernetes.io/zone
rbac:
  annotations: {}
  enabled: false
resources:
  cpu:
    cores: 1
  memory:
    container:
      max: 2.5Gi
serviceAccount:
  annotations: {}
  create: false
  name: ""
statefulset:
  additionalRedpandaCmdFlags: []
  annotations: {}
  budget:
    maxUnavailable: 1
  extraVolumeMounts: ""
  extraVolumes: ""
  initContainerImage:
    repository: busybox
    tag: latest
  initContainers:
    configurator:
      extraVolumeMounts: ""
      resources: {}
    extraInitContainers: ""
    setDataDirOwnership:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
    setTieredStorageCacheDirOwnership:
      extraVolumeMounts: ""
      resources: {}
    tuning:
      extraVolumeMounts: ""
      resources: {}
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
  nodeSelector: {}
  podAffinity: {}
  podAntiAffinity:
    custom: {}
    topologyKey: kubernetes.io/hostname
    type: hard
    weight: 100
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 1
    periodSeconds: 10
    successThreshold: 1
  replicas: 3
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsUser: 101
  sideCars:
    configWatcher:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
      securityContext: {}
    controllers:
      createRBAC: true
      enabled: false
      healthProbeAddress: :8085
      image:
        repository: docker.redpanda.com/redpandadata/redpanda-operator
        tag: v23.2.8
      metricsAddress: :9082
      resources: {}
      run:
      - all
      securityContext: {}
  startupProbe:
    failureThreshold: 120
    initialDelaySeconds: 1
    periodSeconds: 10
  terminationGracePeriodSeconds: 90
  tolerations: []
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  updateStrategy:
    type: RollingUpdate
storage:
  hostPath: ""
  persistentVolume:
    annotations: {}
    enabled: true
    labels: {}
    size: 20Gi
    storageClass: ""
  tiered:
    config:
      cloud_storage_access_key: ""
      cloud_storage_api_endpoint: ""
      cloud_storage_azure_container: null
      cloud_storage_azure_shared_key: null
      cloud_storage_azure_storage_account: null
      cloud_storage_bucket: ""
      cloud_storage_cache_size: 5368709120
      cloud_storage_credentials_source: config_file
      cloud_storage_enable_remote_read: true
      cloud_storage_enable_remote_write: true
      cloud_storage_enabled: false
      cloud_storage_region: ""
      cloud_storage_secret_key: ""
    hostPath: ""
    mountType: emptyDir
    persistentVolume:
      annotations: {}
      labels: {}
      storageClass: ""
tls:
  certs:
    default:
      caEnabled: true
    external:
      caEnabled: true
  enabled: true
tolerations: []
tuning:
  tune_aio_events: true

Anything else we need to know?

No response

Which are the affected charts?

No response

Chart Version(s)

$ helm -n <redpanda-release-namespace> list 
redpanda-5.6.34	v23.2.13

Cloud provider

kind

JIRA Link: K8S-71

JakeSCahill avatar Oct 27 '23 15:10 JakeSCahill

When changing the configuration, the schema server in redpanda is supposed to restart. It's not, which is causing this issue. (see redpanda issue # )

#846 will provide a workaround for this and should resolve this issue.

joejulian avatar Nov 01 '23 22:11 joejulian

Just tested again with redpanda-5.6.63 v23.2.18:

export DOMAIN=customredpandadomain.local && \                           
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set listeners.admin.tls.requireClientAuth=true --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers

Console refuses to start up:

kubectl logs redpanda-console-5dd6bdd548-mc5h7 -n redpanda
{"level":"info","ts":"2023-12-18T14:05:07.538Z","msg":"started Redpanda Console","version":"v2.3.8","built_at":"1701900386"}
{"level":"info","ts":"2023-12-18T14:05:07.539Z","msg":"connecting to Kafka seed brokers, trying to fetch cluster metadata"}
{"level":"info","ts":"2023-12-18T14:05:07.549Z","msg":"successfully connected to kafka cluster","advertised_broker_count":3,"topic_count":5,"controller_id":0,"kafka_version":"unknown custom version at least v0.11.0"}
{"level":"info","ts":"2023-12-18T14:05:07.549Z","msg":"creating schema registry client and testing connectivity"}
{"level":"info","ts":"2023-12-18T14:05:07.557Z","msg":"successfully tested schema registry connectivity"}
{"level":"info","ts":"2023-12-18T14:05:07.557Z","msg":"testing admin client connectivity","urls":["https://redpanda.redpanda.svc.cluster.local.:9644"]}
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
{"level":"fatal","ts":"2023-12-18T14:05:10.630Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers\": remote error: tls: certificate required"}

Secrets available:

kubectl get secret -n redpanda                                          
NAME                                 TYPE                 DATA   AGE
redpanda-client                      kubernetes.io/tls    3      6m29s
redpanda-config-watcher              Opaque               1      6m34s
redpanda-configurator                Opaque               1      6m34s
redpanda-default-cert                kubernetes.io/tls    3      6m29s
redpanda-default-root-certificate    kubernetes.io/tls    3      6m31s
redpanda-external-cert               kubernetes.io/tls    3      6m29s
redpanda-external-root-certificate   kubernetes.io/tls    3      6m31s
redpanda-sts-lifecycle               Opaque               3      6m34s
redpanda-superusers                  Opaque               1      7m33s
sh.helm.release.v1.redpanda.v1       helm.sh/release.v1   1      6m34s

JakeSCahill avatar Dec 18 '23 14:12 JakeSCahill

Just tested this again:

helm repo update
helm install redpanda redpanda/redpanda \
  --version 5.9.0 \
  --namespace jake \
  --create-namespace \
  --set external.domain=customredpandadomain.local \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set "auth.sasl.users[0].name=superuser" --set auth.sasl.enabled=true --set "auth.sasl.users[0].password=secretpassword" --set config.cluster.admin_api_require_auth=true --set "auth.sasl.users[0].mechanism=SCRAM-SHA-512" --set listeners.admin.tls.requireClientAuth=true

Console goes into a crash loop:

{"level":"info","ts":"2024-08-09T13:22:04.243Z","msg":"started Redpanda Console","version":"v2.4.6","built_at":"1712675285"}
{"level":"info","ts":"2024-08-09T13:22:04.243Z","msg":"testing admin client connectivity","urls":["https://redpanda.jake.svc.cluster.local/.:9644"]}
{"level":"info","ts":"2024-08-09T13:22:04.248Z","msg":"successfully tested the Redpanda admin connectivity","broker_count":3,"cluster_version":"Redpanda v24.2.2"}
{"level":"info","ts":"2024-08-09T13:22:04.249Z","msg":"connecting to Kafka seed brokers, trying to fetch cluster metadata"}
{"level":"info","ts":"2024-08-09T13:22:04.257Z","msg":"successfully connected to kafka cluster","advertised_broker_count":3,"topic_count":5,"controller_id":0,"kafka_version":"unknown custom version at least v0.11.0"}
{"level":"info","ts":"2024-08-09T13:22:04.257Z","msg":"creating schema registry client and testing connectivity"}
{"level":"fatal","ts":"2024-08-09T13:22:14.257Z","msg":"failed to create console service","error":"failed to create kafka svc: failed to verify connectivity to schema registry: Get \"https://redpanda-0.redpanda.jake.svc.cluster.local/.:8081/subjects\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Console configmap:

Name:         redpanda-console
Namespace:    jake
Labels:       app.kubernetes.io/instance=redpanda
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=console
              app.kubernetes.io/version=v2.4.6
              helm.sh/chart=console-0.7.26
Annotations:  meta.helm.sh/release-name: redpanda
              meta.helm.sh/release-namespace: jake

Data
====
config.yaml:
----
# from .Values.console.config
kafka:
  brokers:
  - redpanda-0.redpanda.jake.svc.cluster.local.:9093
  - redpanda-1.redpanda.jake.svc.cluster.local.:9093
  - redpanda-2.redpanda.jake.svc.cluster.local.:9093
  sasl:
    enabled: true
  schemaRegistry:
    enabled: true
    tls:
      caFilepath: /mnt/cert/schemaregistry/default/ca.crt
      certFilepath: ""
      enabled: true
      insecureSkipTlsVerify: false
      keyFilepath: ""
    urls:
    - https://redpanda-0.redpanda.jake.svc.cluster.local/.:8081
    - https://redpanda-1.redpanda.jake.svc.cluster.local/.:8081
    - https://redpanda-2.redpanda.jake.svc.cluster.local/.:8081
  tls:
    caFilepath: /mnt/cert/kafka/default/ca.crt
    certFilepath: ""
    enabled: true
    insecureSkipTlsVerify: false
    keyFilepath: ""
redpanda:
  adminApi:
    enabled: true
    tls:
      caFilepath: /mnt/cert/adminapi/default/ca.crt
      certFilepath: /mnt/cert/adminapi/default/tls.crt
      enabled: true
      insecureSkipTlsVerify: false
      keyFilepath: /mnt/cert/adminapi/default/tls.key
    urls:
    - https://redpanda.jake.svc.cluster.local/.:9644


BinaryData
====

JakeSCahill avatar Aug 09 '24 14:08 JakeSCahill

Related to https://github.com/redpanda-data/helm-charts/issues/848

JakeSCahill avatar Aug 22 '24 15:08 JakeSCahill