helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

Issues with Console and Connectors when mTLS is enabled for the Admin API

Open JakeSCahill opened this issue 1 year ago • 4 comments

What happened?

When enabling mTLS for the Admin API, Console and Connectors fail to start. Console reports that it's missing TLS certs:

 {"level":"info","ts":"2023-10-27T15:11:53.281Z","msg":"testing admin client connectivity","urls":["https://redpanda.redpanda.svc.cluster.local.:9644"]}
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
Retrying GET for error: Get "https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers": remote error: tls: certificate required
{"level":"fatal","ts":"2023-10-27T15:11:56.352Z","msg":"failed to create Redpanda service","error":"failed to test admin client connectivity: Get \"https://redpanda.redpanda.svc.cluster.local.:9644/v1/brokers\": remote error: tls: certificate required"}

If I try to disable mTLS after enabling it, the post-upgrade job fails with Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded.

Post-upgrade logs:

Request error, trying another node: Get "https://redpanda-0.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
Request error, trying another node: Get "https://redpanda-1.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": remote error: tls: certificate required
unable to query config schema: Get "https://redpanda-2.redpanda.redpanda.svc.cluster.local.:9644/v1/cluster_config/schema": dial tcp 10.244.2.3:9644: connect: connection refused

If I re-enable mTLS, Console starts running, but there are issues with Admin API connections.

https://github.com/redpanda-data/helm-charts/assets/45230295/62e3ea6f-cdb4-4eb0-93ca-8a12f9f0ddd7

What did you expect to happen?

Redpanda Console and Connectors should work even if mTLS is enabled.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

Running in a kind cluster. I had a few overrides as I was testing a few things.

To enable mTLS with Connectors enabled:

export DOMAIN=customredpandadomain.local && \           
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set listeners.admin.tls.requireClientAuth=true --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers

To try to disable mTLS:

export DOMAIN=customredpandadomain.local && \               
helm repo add redpanda https://charts.redpanda.com/
helm repo update
helm upgrade --install redpanda redpanda/redpanda \
  --namespace redpanda \
  --create-namespace \
  --set external.domain=${DOMAIN} \
  --set statefulset.initContainers.setDataDirOwnership.enabled=true --set connectors.enabled=true --set connectors.deployment.terminationGracePeriodSeconds=300 --set connectors.nameOverride="test-name-2" --set nameOverride="rp-test" --set auth.sasl.enabled=true  --set auth.sasl.secretRef=redpanda-superusers
$ helm get values <redpanda-release-name> -n <redpanda-release-namespace> --all
COMPUTED VALUES:
affinity: {}
auth:
  sasl:
    enabled: true
    mechanism: SCRAM-SHA-512
    secretRef: redpanda-superusers
    users: []
clusterDomain: cluster.local
commonLabels: {}
config:
  cluster:
    default_topic_replications: 3
  node:
    crash_loop_limit: 5
  pandaproxy_client: {}
  rpk: {}
  schema_registry_client: {}
  tunable:
    compacted_log_segment_size: 67108864
    group_topic_partitions: 16
    kafka_batch_max_bytes: 1048576
    kafka_connection_rate_limit: 1000
    log_segment_size: 134217728
    log_segment_size_max: 268435456
    log_segment_size_min: 16777216
    max_compacted_log_segment_size: 536870912
    topic_partitions_per_shard: 1000
connectors:
  auth:
    sasl:
      enabled: false
      mechanism: scram-sha-512
      secretRef: ""
      userName: ""
  commonLabels: {}
  connectors:
    additionalConfiguration: ""
    bootstrapServers: ""
    brokerTLS:
      ca:
        secretNameOverwrite: ""
        secretRef: ""
      cert:
        secretNameOverwrite: ""
        secretRef: ""
      enabled: false
      key:
        secretNameOverwrite: ""
        secretRef: ""
    groupID: connectors-cluster
    producerBatchSize: 131072
    producerLingerMS: 1
    restPort: 8083
    schemaRegistryURL: ""
    secretManager:
      connectorsPrefix: ""
      consolePrefix: ""
      enabled: false
      region: ""
    storage:
      remote:
        read:
          config: false
          offset: false
          status: false
        write:
          config: false
          offset: false
          status: false
      replicationFactor:
        config: -1
        offset: -1
        status: -1
      topic:
        config: _internal_connectors_configs
        offset: _internal_connectors_offsets
        status: _internal_connectors_status
  container:
    javaGCLogEnabled: "false"
    resources:
      javaMaxHeapSize: 2G
      limits:
        cpu: 1
        memory: 2350Mi
      request:
        cpu: 1
        memory: 2350Mi
    securityContext:
      allowPrivilegeEscalation: false
  deployment:
    annotations: {}
    budget:
      maxUnavailable: 1
    create: false
    extraEnv: []
    livenessProbe:
      failureThreshold: 3
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    nodeAffinity: {}
    nodeSelector: {}
    podAffinity: {}
    podAntiAffinity:
      custom: {}
      topologyKey: kubernetes.io/hostname
      type: hard
      weight: 100
    priorityClassName: ""
    progressDeadlineSeconds: 600
    readinessProbe:
      failureThreshold: 2
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 3
      timeoutSeconds: 5
    restartPolicy: Always
    revisionHistoryLimit: 10
    schedulerName: ""
    securityContext:
      fsGroup: 101
      fsGroupChangePolicy: OnRootMismatch
      runAsUser: 101
    strategy:
      type: RollingUpdate
    terminationGracePeriodSeconds: 300
    tolerations: []
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
    updateStrategy:
      type: RollingUpdate
  enabled: true
  fullnameOverride: ""
  global: {}
  image:
    pullPolicy: IfNotPresent
    repository: docker.redpanda.com/redpandadata/connectors
    tag: ""
  imagePullSecrets: []
  logging:
    level: warn
  monitoring:
    annotations: {}
    enabled: false
    labels: {}
    namespaceSelector:
      any: true
    scrapeInterval: 30s
  nameOverride: test-name-2
  service:
    annotations: {}
    name: ""
    ports:
    - name: prometheus
      port: 9404
  serviceAccount:
    annotations: {}
    create: false
    name: ""
  storage:
    volume:
    - emptyDir:
        medium: Memory
        sizeLimit: 5Mi
      name: rp-connect-tmp
    volumeMounts:
    - mountPath: /tmp
      name: rp-connect-tmp
  test:
    create: false
  tolerations: []
console:
  affinity: {}
  annotations: {}
  autoscaling:
    enabled: false
    maxReplicas: 100
    minReplicas: 1
    targetCPUUtilizationPercentage: 80
  config: {}
  configmap:
    create: false
  console:
    config: {}
  deployment:
    create: false
  enabled: true
  enterprise:
    licenseSecretRef:
      key: ""
      name: ""
  extraContainers: []
  extraEnv: []
  extraEnvFrom: []
  extraVolumeMounts: []
  extraVolumes: []
  fullnameOverride: ""
  global: {}
  image:
    pullPolicy: IfNotPresent
    registry: docker.redpanda.com
    repository: redpandadata/console
    tag: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    className: ""
    enabled: false
    hosts:
    - host: chart-example.local
      paths:
      - path: /
        pathType: ImplementationSpecific
    tls: []
  initContainers:
    extraInitContainers: ""
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 0
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  podSecurityContext:
    fsGroup: 99
    runAsUser: 99
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  replicaCount: 1
  resources: {}
  secret:
    create: false
    enterprise: {}
    kafka: {}
    login:
      github: {}
      google: {}
      jwtSecret: ""
      oidc: {}
      okta: {}
    redpanda:
      adminApi: {}
  secretMounts: []
  securityContext:
    runAsNonRoot: true
  service:
    annotations: {}
    port: 8080
    type: ClusterIP
  serviceAccount:
    annotations: {}
    create: true
    name: ""
  tolerations: []
  topologySpreadConstraints: {}
enterprise:
  license: ""
  licenseSecretRef: {}
external:
  domain: customredpandadomain.local
  enabled: true
  service:
    enabled: true
  type: NodePort
fullnameOverride: ""
image:
  pullPolicy: IfNotPresent
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: ""
imagePullSecrets: []
license_key: ""
license_secret_ref: {}
listeners:
  admin:
    external:
      default:
        advertisedPorts:
        - 31644
        port: 9645
        tls:
          cert: external
    port: 9644
    tls:
      cert: default
      requireClientAuth: true
  http:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30082
        authenticationMethod: null
        port: 8083
        tls:
          cert: external
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8082
    tls:
      cert: default
      requireClientAuth: false
  kafka:
    authenticationMethod: null
    external:
      default:
        advertisedPorts:
        - 31092
        authenticationMethod: null
        port: 9094
        tls:
          cert: external
    port: 9093
    tls:
      cert: default
      requireClientAuth: false
  rpc:
    port: 33145
    tls:
      cert: default
      requireClientAuth: false
  schemaRegistry:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30081
        authenticationMethod: null
        port: 8084
        tls:
          cert: external
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8081
    tls:
      cert: default
      requireClientAuth: false
logging:
  logLevel: info
  usageStats:
    enabled: true
monitoring:
  enabled: false
  labels: {}
  scrapeInterval: 30s
  tlsConfig: {}
nameOverride: rp-test
nodeSelector: {}
post_install_job:
  affinity: {}
  enabled: true
post_upgrade_job:
  affinity: {}
  enabled: true
rackAwareness:
  enabled: false
  nodeAnnotation: topology.kubernetes.io/zone
rbac:
  annotations: {}
  enabled: false
resources:
  cpu:
    cores: 1
  memory:
    container:
      max: 2.5Gi
serviceAccount:
  annotations: {}
  create: false
  name: ""
statefulset:
  additionalRedpandaCmdFlags: []
  annotations: {}
  budget:
    maxUnavailable: 1
  extraVolumeMounts: ""
  extraVolumes: ""
  initContainerImage:
    repository: busybox
    tag: latest
  initContainers:
    configurator:
      extraVolumeMounts: ""
      resources: {}
    extraInitContainers: ""
    setDataDirOwnership:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
    setTieredStorageCacheDirOwnership:
      extraVolumeMounts: ""
      resources: {}
    tuning:
      extraVolumeMounts: ""
      resources: {}
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
  nodeSelector: {}
  podAffinity: {}
  podAntiAffinity:
    custom: {}
    topologyKey: kubernetes.io/hostname
    type: hard
    weight: 100
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 1
    periodSeconds: 10
    successThreshold: 1
  replicas: 3
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsUser: 101
  sideCars:
    configWatcher:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
      securityContext: {}
    controllers:
      createRBAC: true
      enabled: false
      healthProbeAddress: :8085
      image:
        repository: docker.redpanda.com/redpandadata/redpanda-operator
        tag: v23.2.8
      metricsAddress: :9082
      resources: {}
      run:
      - all
      securityContext: {}
  startupProbe:
    failureThreshold: 120
    initialDelaySeconds: 1
    periodSeconds: 10
  terminationGracePeriodSeconds: 90
  tolerations: []
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  updateStrategy:
    type: RollingUpdate
storage:
  hostPath: ""
  persistentVolume:
    annotations: {}
    enabled: true
    labels: {}
    size: 20Gi
    storageClass: ""
  tiered:
    config:
      cloud_storage_access_key: ""
      cloud_storage_api_endpoint: ""
      cloud_storage_azure_container: null
      cloud_storage_azure_shared_key: null
      cloud_storage_azure_storage_account: null
      cloud_storage_bucket: ""
      cloud_storage_cache_size: 5368709120
      cloud_storage_credentials_source: config_file
      cloud_storage_enable_remote_read: true
      cloud_storage_enable_remote_write: true
      cloud_storage_enabled: false
      cloud_storage_region: ""
      cloud_storage_secret_key: ""
    hostPath: ""
    mountType: emptyDir
    persistentVolume:
      annotations: {}
      labels: {}
      storageClass: ""
tls:
  certs:
    default:
      caEnabled: true
    external:
      caEnabled: true
  enabled: true
tolerations: []
tuning:
  tune_aio_events: true

Anything else we need to know?

No response

Which are the affected charts?

No response

Chart Version(s)

$ helm -n <redpanda-release-namespace> list 
redpanda-5.6.34	v23.2.13

Cloud provider

kind

JIRA Link: K8S-71

JakeSCahill avatar Oct 27 '23 15:10 JakeSCahill