clickhouse-operator Expanding PVC Volume Template Results in Data Loss

When trying to expand the PVC volume template the operator will delete/re-create the PVC volumes instead of just resizing them. We are using Rook-Ceph as the storage provider and have successfully resized PVCs without delete/re-create. We can also manually edit the PVC itself and it will expand. We are using version 0.22.2 of the operator. I've reproduced it in multiple clusters.

We have tried it without the storageManagement options as well and it just results in a loop where the operator will continually try to delete/re-create the PVCs

    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain

---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"

metadata:
  name: "clickhouse"

spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse:23.7.1.2470-alpine
    storageManagement:
      provisioner: Operator
      reclaimPolicy: Retain

  configuration:
    settings:
         # to allow scrape metrics via embedded prometheus protocol
         prometheus/endpoint: /metrics
         prometheus/port: 8888
         prometheus/metrics: true
         prometheus/events: true
         prometheus/asynchronous_metrics: true
    zookeeper:
      nodes:
      - host: clickhouse-keeper.clickhouse.svc.cluster.local
    users:
      default/networks/ip: "::/0"
      default/password: password
      default/profile: default
      # use cluster Pod CIDR for more security
      backup/networks/ip: 0.0.0.0/0
      # PASSWORD=backup_password; echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
      backup/password_sha256_hex: eb94c11d77f46a0290ba8c4fca1a7fd315b72e1e6c83146e42117c568cc3ea4d
    clusters:
      - name: replicated
        layout:
          shardsCount: 1
          replicasCount: 3
    files:
      config.xml: |
          <?xml version="1.0"?>
          <yandex>
            <remote_servers>
                <!-- Test only shard config for testing distributed storage -->
                <ch_cluster>
                    <shard>
                        <internal_replication>True</internal_replication>
                          <replica>
                              <host>chi-clickhouse-replicated-0-0</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-1</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                          <replica>
                              <host>chi-clickhouse-replicated-0-2</host>
                              <port>9000</port>
                              <secure>0</secure>
                          </replica>
                    </shard>
                </ch_cluster>
            </remote_servers>


            <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
                By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
                Values for substitutions are specified in /clickhouse/name_of_substitution elements in that file.
              -->

            <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
                Optional. If you don't use replicated tables, you could omit that.

                See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/
              -->

            <zookeeper>
                <node>
                    <host>clickhouse-keeper.clickhouse.svc.cluster.local</host>
                    <port>2181</port>
                    <secure>0</secure>
                </node>
            </zookeeper>
            <!--
              OpenTelemetry log contains OpenTelemetry trace spans.
            -->
            <opentelemetry_span_log>
              <!--
                  The default table creation code is insufficient, this <engine> spec
                  is a workaround. There is no 'event_time' for this log, but two times,
                  start and finish. It is sorted by finish time, to avoid inserting
                  data too far away in the past (probably we can sometimes insert a span
                  that is seconds earlier than the last span in the table, due to a race
                  between several spans inserted in parallel). This gives the spans a
                  global order that we can use to e.g. retry insertion into some external
                  system.
              -->
              <engine>
                  engine MergeTree
                  partition by toYYYYMM(finish_date)
                  order by (finish_date, finish_time_us, trace_id)
              </engine>
              <database>system</database>
              <table>opentelemetry_span_log</table>
              <flush_interval_milliseconds>7500</flush_interval_milliseconds>
            </opentelemetry_span_log>
          </yandex>
  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          reclaimPolicy: Retain
          resources:
            requests:
              storage: 55Gi
    podTemplates:
      - name: clickhouse:23.7.1.2470-alpine
        metadata:
          annotations:
              prometheus.io/scrape: 'true'
              prometheus.io/port: '8888'
              prometheus.io/path: '/metrics'
              # need separate prometheus scrape config, look to https://github.com/prometheus/prometheus/issues/3756
              clickhouse.backup/scrape: 'true'
              clickhouse.backup/port: '7171'
              clickhouse.backup/path: '/metrics'
        spec:
          containers:
            - name: clickhouse-pod
              image: clickhouse-server:23.7.1.2470-alpine
            - name: clickhouse-backup
              image: clickhouse-backup:latest
              imagePullPolicy: Always
              command:
                - bash
                - -xc
                - "/bin/clickhouse-backup server"
              env:
                - name: CLICKHOUSE_PASSWORD
                  value: password
                - name: LOG_LEVEL
                  value: "debug"
                - name: ALLOW_EMPTY_BACKUPS
                  value: "true"
                - name: API_LISTEN
                  value: "0.0.0.0:7171"
                # INSERT INTO system.backup_actions to execute backup
                - name: API_CREATE_INTEGRATION_TABLES
                  value: "true"
                - name: BACKUPS_TO_KEEP_REMOTE
                  value: "3"
                # change it for production S3
                - name: REMOTE_STORAGE
                  value: "s3"
                - name: S3_ACL
                  value: "private"
                - name: S3_ENDPOINT
                  value: https://minio
                - name: S3_BUCKET
                  value: clickhouse-backups
                # {shard} macro defined by clickhouse-operator
                - name: S3_PATH
                  value: backup/shard-{shard}
                - name: S3_ACCESS_KEY
                  value: clickhouse_backups_rw
                - name: S3_DISABLE_CERT_VERIFICATION
                  value: "true"
                - name: S3_SECRET_KEY
                  value: password
                - name: S3_FORCE_PATH_STYLE
                  value: "true"
              ports:
                - name: backup-rest
                  containerPort: 7171

Apr 02 '24 11:04 tman5

Thanks. Would it be possible to attach the operator log as a file to this case? I would like to see if there is an issue with operator reconciliation. If you can access rook logs, please attach those as well.

Apr 02 '24 11:04 hodgesrm

dev_clickhouse_operator.txt

Explore-logs-2024-04-02 08 36 22.txt

Apr 02 '24 12:04 tman5

@tman5 , could you show your storage classes?

kubectl get storageclasses -o wide

And it would be useful to see one of PVCs created by an operator.

Apr 02 '24 12:04 alex-zaitsev

NAME                          PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-bucket                   rook-ceph.ceph.rook.io/bucket   Delete          Immediate           false                  208d
ceph-filesystem               rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   208d
rook-ceph-block (default)     rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   208d
sc-smb-mssql-database-repos   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-deploy-scripts   smb.csi.k8s.io                  Retain          Immediate           false                  182d
sc-smb-mssql-wss              smb.csi.k8s.io                  Retain          Immediate           false                  182d

This is one of the PVCs that will perpetually be in a terminating state:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-04-02T12:05:28Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-04-02T12:05:34Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    argocd.argoproj.io/instance: featbit-clickhouse-dev2
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: replicated
    clickhouse.altinity.com/namespace: clark-developer-featbit
    clickhouse.altinity.com/object-version: 241ccf05924775f258c440aecb86eecc549bb3ce
    clickhouse.altinity.com/reclaimPolicy: Retain
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
  name: default-chi-clickhouse-replicated-0-0-0
  namespace: clark-developer-featbit
  resourceVersion: "298826497"
  uid: f9ea50da-82a6-47b9-9231-8a53022d5d03
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 60Gi
  storageClassName: rook-ceph-block
  volumeMode: Filesystem
  volumeName: pvc-f9ea50da-82a6-47b9-9231-8a53022d5d03
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 60Gi
  phase: Bound

Apr 02 '24 13:04 tman5

E0402 12:08:00.875175       1 creator.go:175] updatePersistentVolumeClaim():clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0:unable to Update PVC err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again
E0402 12:08:00.875219       1 worker-chi-reconciler.go:1000] reconcilePVCFromVolumeMount():ERROR unable to reconcile PVC(clark-developer-featbit/default-chi-clickhouse-replicated-0-1-0) err: Operation cannot be fulfilled on persistentvolumeclaims "default-chi-clickhouse-replicated-0-1-0": the object has been modified; please apply your changes to the latest version and try again

it means someone like ArgoCD changed PVC

could you try to deploy CHI without argocd and try to rescale?

Apr 02 '24 14:04 Slach

Is there a way to make it work with argo?

Apr 02 '24 14:04 tman5

Errors can not lead to PVC deletion. I wonder if this is actually ArgoCD that deleted it?

Apr 02 '24 18:04 alex-zaitsev

@tman5 Assuming you are using Argo CD can you describe how you have configured CI/CD and exactly what are the steps you apply to make a change to volume size? It seems possible that multiple actors are trying to manage the CHI resources or at least the underlying volume.

p.s., Argo CD normally is fine with changes to storage size. I've done it many times on AWS EBS volumes.

Apr 04 '24 16:04 hodgesrm

This is my argo-cd config:

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: clickhouse
  namespace: argo-cd
spec:
  destination:
    namespace: clickhouse
    server: https://kube-server
  project: dev
  source:
    path: ./overlays/dev1/clickhouse
    repoURL: https://repo.local
    targetRevision: master
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    retry:
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m0s
      limit: 2
    syncOptions:
    - CreateNamespace=true
    - PruneLast=true
    - PrunePropagationPolicy=foreground
    - ServerSideApply=true
    - --sync-hook-timeout=60s
    - --sync-wait=60s

It points to a repo that has a kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
  - ../../../base/clickhouse-keeper/
  - ../clickhouse-operator/
  - manifest.yml
  - clickhouse-backup-rw-password.yml

namespace: clickhouse
...

Then the manifest file is what i posted above. I edit the PVC size in that manifest, commit it to the repo and then let argo do it's thing

Apr 05 '24 11:04 tman5

In the clickhouse-operator directory, this is the kustomize file:

---
kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

helmCharts:
  - name: altinity-clickhouse-operator
    releaseName: clickhouse-operator
    namespace: clickhouse
    repo: https://docs.altinity.com/clickhouse-operator/
    version: 0.22.2
    valuesInline:
      configs:
        configdFiles:
          01-clickhouse-02-logger.xml: |
            <!-- IMPORTANT -->
            <!-- This file is auto-generated -->
            <!-- Do not edit this file - all changes would be lost -->
            <!-- Edit appropriate template in the following folder: -->
            <!-- deploy/builder/templates-config -->
            <!-- IMPORTANT -->
            <yandex>
                <logger>
                    <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
                    <level>warning</level>
                    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
                    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
                    <size>1000M</size>
                    <count>10</count>
                    <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
                    <console>1</console>
                </logger>
            </yandex>


...

Apr 05 '24 11:04 tman5

@tman5 , it is possible that there is a conflict between ArgoCD and operator. Try altering operator configuration in order to remove labels from dependent objects, including PVCs:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseOperatorConfiguration"
metadata:
  name: "exclude-argocd-label"
spec:
  label:
    exclude:
      - argocd.argoproj.io/instance

Aug 15 '24 13:08 alex-zaitsev

I can confirm its clash betwen argo and the operator. Tt happed with me, using kubectl everything was good, including even i destroy the entire nodes, just left the pvc.

But one everything finish. I put my yamls itu argo, and then argo start syncing adding labels app.kubernetes.io/instance

This is what happend after i put my yamls into argo.

  Info  ReconcileStarted        45m   clickhouse-operator  reconcile started, task id: 44562628-2111-4328-a3c9-be22829c8eb9
  Info  UpdateCompleted         45m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-configd
  Info  UpdateCompleted         45m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  UpdateCompleted         45m   clickhouse-operator  Update Service success: ch-data-warehouse/service-dwp
  Info  UpdateCompleted         45m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-configd
  Info  UpdateCompleted         45m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  UpdateCompleted         45m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  UpdateCompleted         44m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-deploy-confd-dwp-0-0
  Info  CreateStarted           44m   clickhouse-operator  Update StatefulSet(ch-data-warehouse/chi-jb-data-warehouse-dwp-0-0) - started
  Info  UpdateInProgress        44m   clickhouse-operator  Update StatefulSet(ch-data-warehouse/chi-jb-data-warehouse-dwp-0-0) switch from Update to Recreate
  Info  CreateStarted           43m   clickhouse-operator  Create StatefulSet: ch-data-warehouse/chi-jb-data-warehouse-dwp-0-0 - started
  Info  UpdateCompleted         38m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  CreateCompleted         38m   clickhouse-operator  Create StatefulSet: ch-data-warehouse/chi-jb-data-warehouse-dwp-0-0 - completed
  Info  UpdateCompleted         38m   clickhouse-operator  Update Service success: ch-data-warehouse/service-jb-data-warehouse-0-0
  Info  ProgressHostsCompleted  38m   clickhouse-operator  [now: 2025-02-12 04:59:36.643838886 +0000 UTC m=+86464.463691633] ProgressHostsCompleted: 1 of 4
  Info  UpdateCompleted         38m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-configd
  Info  ReconcileCompleted      38m   clickhouse-operator  Reconcile Host completed. Host: 0-0 ClickHouse version running: 24.8.13.16
  Info  UpdateCompleted         38m   clickhouse-operator  Update Service success: ch-data-warehouse/service-jb-data-warehouse
  Info  UpdateCompleted         38m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  UpdateCompleted         38m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-configd
  Info  UpdateCompleted         38m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-common-usersd
  Info  UpdateCompleted         37m   clickhouse-operator  Update ConfigMap ch-data-warehouse/chi-jb-data-warehouse-deploy-confd-dwp-0-1
  Info  CreateStarted           37m   clickhouse-operator  Update StatefulSet(ch-data-warehouse/chi-jb-data-warehouse-dwp-0-1) - started
  Info  UpdateInProgress        36m   clickhouse-operator  Update StatefulSet(ch-data-warehouse/chi-jb-data-warehouse-dwp-0-1) switch from Update to Recreate
  Info  CreateStarted           36m   clickhouse-operator  Create StatefulSet: ch-data-warehouse/chi-jb-data-warehouse-dwp-0-1 - started

Feb 12 '25 05:02 nicolasjulian-jubelio