helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

🔹🐛 Operator default listeners seem to be clashing with user provided ports

Open c4milo opened this issue 10 months ago • 9 comments

What happened?

I used ports 30081 and 30082 in "external" listeners and the operator complained that the port was already used:

│ manager {"level":"debug","ts":"2024-04-01T21:32:22.924Z","logger":"events","msg":"Helm upgrade failed for release red ││ panda/redpanda with chart [email protected]: failed to create resource: Service \"redpanda-external\" is invalid: spec. ││ ports[4].nodePort: Invalid value: 30082: provided port is already allocated\n\nLast Helm logs:\n\n2024-04-01T21:32:22 ││ .501865617Z: Created a new PodDisruptionBudget called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.522146873Z: Cre ││ ated a new ServiceAccount called \"id-rpcloud-9m4e2mr0ui3e8a215n4\" in redpanda\n\n2024-04-01T21:32:22.540660454Z: Cr ││ eated a new Secret called \"redpanda-sts-lifecycle\" in redpanda\n\n2024-04-01T21:32:22.557186641Z: Created a new Sec ││ ret called \"redpanda-config-watcher\" in redpanda\n\n2024-04-01T21:32:22.576038741Z: Created a new Secret called \"r ││ edpanda-configurator\" in redpanda\n\n2024-04-01T21:32:22.592036011Z: Created a new Secret called \"redpanda-fs-valid ││ ator\" in redpanda\n\n2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called \"redpanda\" in redpanda\n\n2024 ││ -04-01T21:32:22.626412621Z: Created a new ConfigMap called \"redpanda-rpk\" in redpanda\n\n2024-04-01T21:32:22.648893 ││ 55Z: Created a new Service called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.778871216Z: warning: Upgrade \"redp ││ anda\" failed: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid v │
│ alue: 30082: provided port is already allocated","type":"Warning","object":{"kind":"HelmRelease","namespace":"redpand │
│ a","name":"redpanda","uid":"9b7006ec-60b7-496b-b21c-0ee3064f8e6d","apiVersion":"helm.toolkit.fluxcd.io/v2beta2","reso │
│ urceVersion":"11709469"},"reason":"UpgradeFailed"}                                                                    │
│

What did you expect to happen?

If I can provide external ports, I expect the operator to honor them. Any hidden magic is highly undesired.

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

$ helm get values <redpanda-release-name> -n <redpanda-release-namespace> --all
COMPUTED VALUES:
affinity: {}
auditLogging:
  clientMaxBufferSize: 16777216
  enabled: false
  enabledEventTypes: null
  excludedPrincipals: null
  excludedTopics: null
  listener: internal
  partitions: 12
  queueDrainIntervalMs: 500
  queueMaxBufferSizePerShard: 1048576
  replicationFactor: null
auth:
  sasl:
    enabled: false
    mechanism: SCRAM-SHA-512
    secretRef: redpanda/redpanda-superusers
    users: []
clusterDomain: cluster.local
commonLabels: {}
config:
  cluster:
    cloud_storage_azure_container: 9m4e2mr0ui3e8a215n4g
    cloud_storage_azure_storage_account: testcamilo9
    cloud_storage_credentials_source: azure_aks_oidc_federation
    cloud_storage_enable_remote_read: "true"
    cloud_storage_enable_remote_write: "true"
    cloud_storage_enabled: "false"
    default_topic_replications: "3"
    minimum_topic_replications: "3"
  node:
    crash_loop_limit: 5
  pandaproxy_client: {}
  rpk: {}
  schema_registry_client: {}
  tunable:
    compacted_log_segment_size: 67108864
    group_topic_partitions: 16
    kafka_batch_max_bytes: 1048576
    kafka_connection_rate_limit: 1000
    log_segment_size: 134217728
    log_segment_size_max: 268435456
    log_segment_size_min: 16777216
    max_compacted_log_segment_size: 536870912
    topic_partitions_per_shard: 1000
connectors:
  deployment:
    create: false
  enabled: false
  test:
    create: false
console:
  config: {}
  configmap:
    create: false
  deployment:
    create: false
  enabled: false
  secret:
    create: false
enterprise:
  license: ""
  licenseSecretRef:
    key: license
    name: redpanda-9m4e2mr0ui3e8a215n4g-license
external:
  addresses:
  - $PREFIX_TEMPLATE
  domain: camilo.panda.dev
  enabled: true
  externalDns:
    enabled: true
  prefixTemplate: rp${POD_ORDINAL}-$(echo -n $HOST_IP_ADDRESS | sha256sum | head -c
    7)
  service:
    enabled: true
  type: NodePort
fullnameOverride: ""
image:
  pullPolicy: IfNotPresent
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: v23.3.7
imagePullSecrets: []
license_key: ""
license_secret_ref: {}
listeners:
  admin:
    external:
      admin-api:
        advertisedPorts:
        - 30644
        authenticationMethod: sasl
        enabled: false
        port: 30644
        tls:
          cert: letsencrypt
          enabled: true
          requireClientAuth: false
      default:
        advertisedPorts:
        - 31644
        port: 9645
        tls:
          cert: external
    port: 9644
    tls:
      cert: letsencrypt
      enabled: true
      requireClientAuth: false
  http:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30082
        authenticationMethod: null
        port: 8083
        tls:
          cert: external
          requireClientAuth: false
      http-proxy:
        advertisedPorts:
        - 30082
        authenticationMethod: http_basic
        enabled: true
        port: 30082
        tls:
          cert: letsencrypt
          enabled: true
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8082
    prefixTemplate: http-proxy$POD_ORDINAL
    tls:
      cert: letsencrypt
      enabled: true
      requireClientAuth: false
  kafka:
    authenticationMethod: sasl
    external:
      default:
        advertisedPorts:
        - 31092
        authenticationMethod: null
        port: 9094
        tls:
          cert: external
      kafka-api:
        advertisedPorts:
        - 30092
        authenticationMethod: sasl
        enabled: true
        port: 30092
        tls:
          cert: letsencrypt
          requireClientAuth: false
    port: 9092
    prefixTemplate: kafka-api$POD_ORDINAL
    tls:
      cert: letsencrypt
      requireClientAuth: false
  rpc:
    port: 33145
    tls:
      cert: letsencrypt
      requireClientAuth: false
  schemaRegistry:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30081
        authenticationMethod: null
        port: 8084
        tls:
          cert: external
          requireClientAuth: false
      schema-registry:
        advertisedPorts:
        - 30081
        authenticationMethod: http_basic
        enabled: true
        port: 30081
        tls:
          cert: letsencrypt
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8081
    tls:
      cert: letsencrypt
      requireClientAuth: false
logging:
  logLevel: debug
  usageStats:
    clusterId: 9m4e2mr0ui3e8a215n4g
    enabled: true
monitoring:
  enabled: false
  labels: {}
  scrapeInterval: 30s
  tlsConfig: {}
nameOverride: ""
nodeSelector: {}
post_install_job:
  affinity: {}
  enabled: true
post_upgrade_job:
  affinity: {}
  enabled: true
rackAwareness:
  enabled: true
  nodeAnnotation: topology.kubernetes.io/zone
rbac:
  annotations: {}
  enabled: false
resources:
  cpu:
    cores: "8"
  memory:
    container:
      max: 2Gi
      min: 2Gi
serviceAccount:
  annotations:
    azure.workload.identity/client-id: c90db393-857d-41d0-ac0d-0e61271fcaa6
  create: true
  name: id-rpcloud-9m4e2mr0ui3e8a215n4
statefulset:
  additionalRedpandaCmdFlags:
  - --abort-on-seastar-bad-alloc
  - --dump-memory-diagnostics-on-alloc-failure-kind=all
  annotations: {}
  budget:
    maxUnavailable: 1
  extraVolumeMounts: ""
  extraVolumes: ""
  initContainerImage:
    repository: busybox
    tag: latest
  initContainers:
    configurator:
      extraVolumeMounts: ""
      resources: {}
    extraInitContainers: ""
    fsValidator:
      enabled: true
      expectedFS: xfs
      extraVolumeMounts: ""
      resources: {}
    setDataDirOwnership:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
    setTieredStorageCacheDirOwnership:
      extraVolumeMounts: ""
      resources: {}
    tuning:
      extraVolumeMounts: ""
      resources: {}
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
  nodeSelector:
    cloud.redpanda.com/role: redpanda
  podAffinity: {}
  podAntiAffinity:
    custom: {}
    topologyKey: kubernetes.io/hostname
    type: hard
    weight: 100
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 1
    periodSeconds: 10
    successThreshold: 1
  replicas: 3
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsUser: 101
  sideCars:
    configWatcher:
      enabled: true
      extraVolumeMounts: ""
      resources: {}
      securityContext: {}
    controllers:
      createRBAC: true
      enabled: false
      healthProbeAddress: :8085
      image:
        repository: docker.redpanda.com/redpandadata/redpanda-operator
        tag: v2.1.10-23.2.18
      metricsAddress: :9082
      resources: {}
      run:
      - all
      securityContext: {}
  startupProbe:
    failureThreshold: 120
    initialDelaySeconds: 1
    periodSeconds: 10
  terminationGracePeriodSeconds: 90
  tolerations:
  - effect: NoSchedule
    key: cloud.redpanda.com/role
    operator: Equal
    value: redpanda
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  updateStrategy:
    type: RollingUpdate
storage:
  hostPath: ""
  persistentVolume:
    annotations: {}
    enabled: true
    labels: {}
    nameOverwrite: ""
    size: 4096Gi
    storageClass: local-path
  tiered:
    config:
      cloud_storage_access_key: ""
      cloud_storage_api_endpoint: ""
      cloud_storage_azure_container: null
      cloud_storage_azure_shared_key: null
      cloud_storage_azure_storage_account: null
      cloud_storage_bucket: ""
      cloud_storage_cache_size: 5368709120
      cloud_storage_credentials_source: config_file
      cloud_storage_enable_remote_read: true
      cloud_storage_enable_remote_write: true
      cloud_storage_enabled: false
      cloud_storage_region: ""
      cloud_storage_secret_key: ""
    credentialsSecretRef:
      accessKey:
        configurationKey: cloud_storage_access_key
      secretKey:
        configurationKey: cloud_storage_secret_key
    hostPath: ""
    mountType: persistentVolume
    persistentVolume:
      annotations: {}
      labels: {}
      storageClass: local-path
tests:
  enabled: true
tls:
  certs:
    default:
      caEnabled: true
    external:
      caEnabled: true
    letsencrypt:
      caEnabled: false
      duration: 43800h0m0s
      issuerRef:
        kind: ClusterIssuer
        name: letsencrypt-dns-prod
  enabled: true
tolerations: []
tuning:
  tune_aio_events: true

Anything else we need to know?

No response

Which are the affected charts?

Operator

Chart Version(s)

$ helm -n <redpanda-release-namespace> list 
NAME             	NAMESPACE	REVISION	UPDATED                            	STATUS  	CHART          	APP VERSION
redpanda-operator	redpanda 	2       	2024-04-01 16:29:38.92053 -0400 EDT	deployed	operator-0.4.20	v2.1.15-23.3.7

Cloud provider

Azure

JIRA Link: K8S-129

c4milo avatar Apr 01 '24 21:04 c4milo

It's not operator nor helm-chart responsibility to handle node port conflict.

Please attach kubectl get svc -A -o yaml output to this issue. I wonder if any redpanda helm chart helm release is still in your cluster as left over.

The Redpanda resource spec to solve this issue.

RafalKorepta avatar Apr 02 '24 11:04 RafalKorepta

I've re-wrapped the error messages from Camilo:

{
  "level": "debug",
  "ts": "2024-04-01T21:32:22.924Z",
  "logger": "events",
  "msg": "Helm upgrade failed for release redpanda/redpanda with chart [email protected]: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated\n\nLast Helm logs:\n\n2024-04-01T21:32:22.501865617Z: Created a new PodDisruptionBudget called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.522146873Z: Created a new ServiceAccount called \"id-rpcloud-9m4e2mr0ui3e8a215n4\" in redpanda\n\n2024-04-01T21:32:22.540660454Z: Created a new Secret called \"redpanda-sts-lifecycle\" in redpanda\n\n2024-04-01T21:32:22.557186641Z: Created a new Secret called \"redpanda-config-watcher\" in redpanda\n\n2024-04-01T21:32:22.576038741Z: Created a new Secret called \"redpanda-configurator\" in redpanda\n\n2024-04-01T21:32:22.592036011Z: Created a new Secret called \"redpanda-fs-validator\" in redpanda\n\n2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.626412621Z: Created a new ConfigMap called \"redpanda-rpk\" in redpanda\n\n2024-04-01T21:32:22.64889355Z: Created a new Service called \"redpanda\" in redpanda\n\n2024-04-01T21:32:22.778871216Z: warning: Upgrade \"redpanda\" failed: failed to create resource: Service \"redpanda-external\" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated",
  "type": "Warning",
  "object": {
    "kind": "HelmRelease",
    "namespace": "redpanda",
    "name": "redpanda",
    "uid": "9b7006ec-60b7-496b-b21c-0ee3064f8e6d",
    "apiVersion": "helm.toolkit.fluxcd.io/v2beta2",
    "resourceVersion": "11709469"
  },
  "reason": "UpgradeFailed"
}
Helm upgrade failed for release redpanda/redpanda with chart [email protected]: failed to create resource: Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

Last Helm logs:

2024-04-01T21:32:22.501865617Z: Created a new PodDisruptionBudget called "redpanda" in redpanda

2024-04-01T21:32:22.522146873Z: Created a new ServiceAccount called "id-rpcloud-9m4e2mr0ui3e8a215n4" in redpanda

2024-04-01T21:32:22.540660454Z: Created a new Secret called "redpanda-sts-lifecycle" in redpanda

2024-04-01T21:32:22.557186641Z: Created a new Secret called "redpanda-config-watcher" in redpanda

2024-04-01T21:32:22.576038741Z: Created a new Secret called "redpanda-configurator" in redpanda

2024-04-01T21:32:22.592036011Z: Created a new Secret called "redpanda-fs-validator" in redpanda

2024-04-01T21:32:22.610714711Z: Created a new ConfigMap called "redpanda" in redpanda

2024-04-01T21:32:22.626412621Z: Created a new ConfigMap called "redpanda-rpk" in redpanda

2024-04-01T21:32:22.64889355Z: Created a new Service called "redpanda" in redpanda

2024-04-01T21:32:22.778871216Z: warning: Upgrade "redpanda" failed: failed to create resource: Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

chrisseto avatar Apr 02 '24 18:04 chrisseto

Interesting, I am seeing the same behavior

helm upgrade --install redpanda charts/redpanda -n redpanda --create-namespace --values 1124.yaml
Release "redpanda" does not exist. Installing it now.
Error: 1 error occurred:
	* Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

templating this shows the following

# Source: redpanda/templates/services.nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: redpanda-external
  namespace: "redpanda"
  labels:
    app.kubernetes.io/component: redpanda
    app.kubernetes.io/instance: redpanda
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redpanda
    helm.sh/chart: redpanda-5.7.37
spec:
  type: NodePort
  publishNotReadyAddresses: true
  externalTrafficPolicy: Local
  sessionAffinity: None
  ports:
    - name: admin-default
      protocol: TCP
      port: 9645
      nodePort: 31644
    - name: kafka-default
      protocol: TCP
      port: 9094
      nodePort: 31092
    - name: kafka-kafka-api
      protocol: TCP
      port: 30092
      nodePort: 30092
    - name: http-default
      protocol: TCP
      port: 8083
      nodePort: 30082
    - name: http-http-proxy
      protocol: TCP
      port: 30082
      nodePort: 30082
    - name: schema-default
      protocol: TCP
      port: 8084
      nodePort: 30081
    - name: schema-schema-registry
      protocol: TCP
      port: 30081
      nodePort: 30081
  selector:
    app.kubernetes.io/name: redpanda
    app.kubernetes.io/instance: "redpanda"
    app.kubernetes.io/component: redpanda-statefulset

I think the problem is that there is two entries with the same nodeport.

when i apply the above file only in a clean installation

k apply -f  a.yaml
The Service "redpanda-external" is invalid: spec.ports[4].nodePort: Invalid value: 30082: provided port is already allocated

so i think this is the problem.

alejandroEsc avatar Apr 04 '24 01:04 alejandroEsc

To make this work i made the following changes to your values

  schemaRegistry:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30084

and

  http:
    authenticationMethod: http_basic
    enabled: true
    external:
      default:
        advertisedPorts:
        - 30083
        authenticationMethod: null
        port: 8083

alejandroEsc avatar Apr 04 '24 01:04 alejandroEsc

I believe this is just input error and no "magic" on our end. Perhaps the port list is a bit confusing, its something we have wanted to change for a while now.

alejandroEsc avatar Apr 04 '24 01:04 alejandroEsc

@alejandroEsc could we add some validation in that case? This feels like a pretty sharp edge.

chrisseto avatar Apr 04 '24 14:04 chrisseto

Im not sure what we agreed to for this ticket, if the idea is to just shut off service creation to allow for this values file to write out to the internal redpanda.yaml (even though the external is not correct for k8s) then you can proceed by

  # -- Service allows you to manage the creation of an external kubernetes service object
  service:
    # -- Enabled if set to false will not create the external service type
    # You can still set your cluster with external access but not create the supporting service (NodePort/LoadBalander).
    # Set this to false if you rather manage your own service.
    enabled: false

if that is the case then we can close this ticket. Otherwise we can help with documentation. I am not convinced that additional validation would help this situation.

alejandroEsc avatar Apr 04 '24 19:04 alejandroEsc

If you let me disable the operator 's default listeners, I'll be on my way. I don't need them but I need the ports, to keep them aligned with AWS's and GCP's.

c4milo avatar Apr 11 '24 03:04 c4milo

If you let me disable the operator 's default listeners, I'll be on my way. I don't need them but I need the ports, to keep them aligned with AWS's and GCP's.

let's talk and see if we can figure out what you require, im not sure we can disable listeners, never tried.

alejandroEsc avatar Apr 12 '24 17:04 alejandroEsc

@c4milo if you have problem configuring node ports please re-open this issue.

RafalKorepta avatar Aug 02 '24 08:08 RafalKorepta