clickhouse-operator Question: change keeper replica count (decrease 5 -> 3)

So I have a Clickhouse cluster of 3 shards and 2 replicas. When I created the keeper I added 5 keepers (not sure why).

My question is, is it safe to just decrease that value to 3 keepers and update the config spec.configuration.zookeeper.nodes to point to the 3 keepers?

ClickHouseKeeperInstallation:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: keeper
spec:
  configuration:
    clusters:
      - name: "cluster-name"
        layout:
          replicasCount: 5
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"

  defaults:
    templates:
      podTemplate: default
      dataVolumeClaimTemplate: default

  templates:
    podTemplates:
      - name: default
        metadata:
          labels:
            app: clickhouse-keeper
        spec:
          nodeSelector:
            node.kubernetes.io/instance-type: ccx13
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                      - key: "app"
                        operator: In
                        values:
                          - clickhouse-keeper
                  topologyKey: "kubernetes.io/hostname"
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: clickhouse/clickhouse-keeper:24.12
              resources:
                requests:
                  memory: "500Mi"
                  cpu: "0.2"
                limits:
                  memory: "1Gi"
                  cpu: "1"
          securityContext:
            fsGroup: 101

    volumeClaimTemplates:
      - name: default
        reclaimPolicy: Retain
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi

ClickhouseInstallation (relevant part):

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "chi-cluster"
spec:
  defaults:
    templates: 
      dataVolumeClaimTemplate: default
      podTemplate: clickhouse-pod-template
      serviceTemplate: svc-template
  configuration:
    zookeeper:
      nodes:
        - host: chk-keeper-openpanel-0-0
          port: 2181
        - host: chk-keeper-openpanel-0-1
          port: 2181
        - host: chk-keeper-openpanel-0-2
          port: 2181
        - host: chk-keeper-openpanel-0-3
          port: 2181
        - host: chk-keeper-openpanel-0-4
          port: 2181
    clusters:
      - name: "cluster-name"
        secure: "yes"
        layout:
          shardsCount: 3
          replicasCount: 2

Feb 14 '25 10:02 lindesvard

unfortunatelly currently keeper scale up and scale down operations is not tested well and keeper could switch into Crashloopback ...

you need to train first on local minikube, to ensure how it works

Feb 14 '25 12:02 Slach

Thanks for fast reply.

Will give this a spin in a clean cluster and see whats happening 🫡🤞

Do you think size of the cluster/ingestion speed would have any impact on the outcome?

Feb 14 '25 22:02 lindesvard

actually not, 3 keeper replicas which have enough CPU/RAM/disk speed enough of 99% use cases

clickhouse-server make only one persistent connection to only one random keeper replica from zookeeper config session

Feb 15 '25 02:02 Slach

key metric for speed is network latency between clickhouse-server and clickhouse-keeper and between keepers each other

Feb 15 '25 02:02 Slach

is there anything new with the scaling stability?

Mar 07 '25 12:03 Brainpitcher

Well, yesterday we had tried to scale up chkeeper replicas from 1 to 3 and booom... 2 of them works well and 1 got into endless Crashloopback. Hopefully it was an empty stage cluster. The question is how to make it in production enviroment especially after the last breaking changes with keeper, i mean - [release-0.24.0] and all that pvc stuff

Mar 13 '25 09:03 Brainpitcher

@Brainpitcher do you use operator 0.24.x for keeper?

Mar 13 '25 09:03 Slach

@Brainpitcher do you use operator 0.24.x for keeper?

yeap, now it is version 0.24.4, but we had started in production with 0.23.7 and had to use some instructions to make a keeper migration

Mar 13 '25 10:03 Brainpitcher

try to delete PV+PVC + pod which have crashloopback status

Mar 13 '25 13:03 Slach

try to delete PV+PVC + pod which have crashloopback status

i wasn't able to reproduce the situation, new cluster keepers scaled up and down wthout any troubles.

but the last question still remains - how to update and scale up ch-keeper replicas in production after the migration things like setting PVC namу in ch-keeper manifest?

     volumeClaimTemplates:
      - name: default
        metadata:
          name: both-paths
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: universalssd
          volumeName: pvc-768d3f5f-ed7d-4c1f-8206-32d5dd7b2b25
          resources:
            requests:
              storage: 5Gi

i mean what will happen when we increase the replicas number to 3

Mar 14 '25 08:03 Brainpitcher

in normal case, with replicas:3 clickhouse-operator 0.24.x will create separate statefulset + PVC for each new replica

Mar 14 '25 09:03 Slach

in normal case, with replicas:3 clickhouse-operator 0.24.x will create separate statefulset + PVC for each new replica

so, what we have now

we scale up replicas from 1 to 3
operator creates ne PVCs and all meta begins a new live in all replicas
everybody is happy

am i right?

Mar 14 '25 09:03 Brainpitcher

yes, in normal case new replicas shall start as follower and when quorum achieve, everything will fine

in the worst case you can create keeper from scratch and restore keeper data from clickhouse-server via SYSTEM RESTORE REPLICA db.table

Mar 14 '25 09:03 Slach

yes, in normal case new replicas shall start as follower and when quorum achieve

in worst case you can create keeper from scratch and restore keeper data from clickhouse-server via SYSTEM RESTORE REPLICA db.table

you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?

Mar 14 '25 10:03 Brainpitcher

you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?

in worst case, if you lost keeper data you can create keeper from scrach (delete PV, delete CHK, apply CHK)

and use SYSTEM RESTORE REPLICA db.table for all replicated table

something like that for disaster recovery plan clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress

Mar 14 '25 11:03 Slach

you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?

in worst case, if you lost keeper data you can create keeper from scrach (delete PV, delete CHK, apply CHK)

and use SYSTEM RESTORE REPLICA db.table for all replicated table

something like that for disaster recovery plan clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress

many thanks

Mar 14 '25 11:03 Brainpitcher

Just finished with the second cluster, at this time a trying to update ch-keeper to 24.12-alpine and then i increased the replica number after update everything went smoothly but after the increase - the quorum was lost and chi went to RO :(

system replica restore - was the cure but only after dettach/attach tables

i got to get logs with thing like

2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387386 [ 50 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/ClickHouse/ClickHouse/issues

but i think it should be an another issue to make

Mar 18 '25 10:03 Brainpitcher

I just want to ask (not expert in this field) since I might get into these issues as well.

After updating replicas, did you do the following?

DETACH TABLE table ON CLUSTER '{cluster}' for all tables in the db?
ATTACH TABLE table ON CLUSTER '{cluster}' for all tables in the db?
clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress

Would it help to just detach the entire DB instead of each table?

Mar 18 '25 13:03 lindesvard

I just want to ask (not expert in this field) since I might get into these issues as well.

After updating replicas, did you do the following?

DETACH TABLE table ON CLUSTER '{cluster}' for all tables in the db?

ATTACH TABLE table ON CLUSTER '{cluster}' for all tables in the db?

clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress

Would it help to just detach the entire DB instead of each table?

you just have to detach and attach replicated tables, you can find all off them with smth like this

SELECT
    database,
    name,
    engine
FROM system.tables
WHERE engine LIKE 'Replicated%'

and getting smth like

Query id: 51cec1e2-02ce-4399-90b1-b1cb22a38315

   ┌─database─┬─name──────────┬─engine──────────────┐
1. │ test_db  │ replica_table │ ReplicatedMergeTree │
   └──────────┴───────────────┴─────────────────────┘

1 row in set. Elapsed: 0.004 sec.

so in may case all of them should be attached and detached before SYSTEM RESTORE REPLICA

i made it with:

for table in \
    "test_db.replica_table" \
    "test_db.replica_table1"; do
    clickhouse-client -q "DETACH TABLE $table;" --echo --progress
done

and

for table in \
    "test_db.replica_table" \
    "test_db.replica_table1"; do
    clickhouse-client -q "ATTACH TABLE $table;" --echo --progress
done

most of all because i wanted to drive the way they detached and attached and sometimes when you try to put them in pipe you may get that table is locked because someone use it and then script failes

Mar 18 '25 14:03 Brainpitcher

@Brainpitcher appreciate the detailed explanation 🙏 thanks a lot

Mar 19 '25 10:03 lindesvard

clickhouse-operator clickhouse-operator copied to clipboard

Question: change keeper replica count (decrease 5 -> 3)

clickhouse-operator
clickhouse-operator copied to clipboard