clickhouse-operator
clickhouse-operator copied to clipboard
Question: change keeper replica count (decrease 5 -> 3)
So I have a Clickhouse cluster of 3 shards and 2 replicas. When I created the keeper I added 5 keepers (not sure why).
My question is, is it safe to just decrease that value to 3 keepers and update the config spec.configuration.zookeeper.nodes to point to the 3 keepers?
ClickHouseKeeperInstallation:
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
name: keeper
spec:
configuration:
clusters:
- name: "cluster-name"
layout:
replicasCount: 5
settings:
logger/level: "trace"
logger/console: "true"
listen_host: "0.0.0.0"
keeper_server/four_letter_word_white_list: "*"
keeper_server/coordination_settings/raft_logs_level: "information"
prometheus/endpoint: "/metrics"
prometheus/port: "7000"
prometheus/metrics: "true"
prometheus/events: "true"
prometheus/asynchronous_metrics: "true"
prometheus/status_info: "false"
defaults:
templates:
podTemplate: default
dataVolumeClaimTemplate: default
templates:
podTemplates:
- name: default
metadata:
labels:
app: clickhouse-keeper
spec:
nodeSelector:
node.kubernetes.io/instance-type: ccx13
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- clickhouse-keeper
topologyKey: "kubernetes.io/hostname"
containers:
- name: clickhouse-keeper
imagePullPolicy: IfNotPresent
image: clickhouse/clickhouse-keeper:24.12
resources:
requests:
memory: "500Mi"
cpu: "0.2"
limits:
memory: "1Gi"
cpu: "1"
securityContext:
fsGroup: 101
volumeClaimTemplates:
- name: default
reclaimPolicy: Retain
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
ClickhouseInstallation (relevant part):
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "chi-cluster"
spec:
defaults:
templates:
dataVolumeClaimTemplate: default
podTemplate: clickhouse-pod-template
serviceTemplate: svc-template
configuration:
zookeeper:
nodes:
- host: chk-keeper-openpanel-0-0
port: 2181
- host: chk-keeper-openpanel-0-1
port: 2181
- host: chk-keeper-openpanel-0-2
port: 2181
- host: chk-keeper-openpanel-0-3
port: 2181
- host: chk-keeper-openpanel-0-4
port: 2181
clusters:
- name: "cluster-name"
secure: "yes"
layout:
shardsCount: 3
replicasCount: 2
unfortunatelly currently keeper scale up and scale down operations is not tested well and keeper could switch into Crashloopback ...
you need to train first on local minikube, to ensure how it works
Thanks for fast reply.
Will give this a spin in a clean cluster and see whats happening π«‘π€
Do you think size of the cluster/ingestion speed would have any impact on the outcome?
actually not, 3 keeper replicas which have enough CPU/RAM/disk speed enough of 99% use cases
clickhouse-server make only one persistent connection to only one random keeper replica from zookeeper config session
key metric for speed is network latency between clickhouse-server and clickhouse-keeper and between keepers each other
is there anything new with the scaling stability?
Well, yesterday we had tried to scale up chkeeper replicas from 1 to 3 and booom... 2 of them works well and 1 got into endless Crashloopback. Hopefully it was an empty stage cluster. The question is how to make it in production enviroment especially after the last breaking changes with keeper, i mean - [release-0.24.0] and all that pvc stuff
@Brainpitcher do you use operator 0.24.x for keeper?
@Brainpitcher do you use operator 0.24.x for keeper?
yeap, now it is version 0.24.4, but we had started in production with 0.23.7 and had to use some instructions to make a keeper migration
try to delete PV+PVC + pod which have crashloopback status
try to delete PV+PVC + pod which have crashloopback status
i wasn't able to reproduce the situation, new cluster keepers scaled up and down wthout any troubles.
but the last question still remains - how to update and scale up ch-keeper replicas in production after the migration things like setting PVC namΡ in ch-keeper manifest?
volumeClaimTemplates:
- name: default
metadata:
name: both-paths
spec:
accessModes:
- ReadWriteOnce
storageClassName: universalssd
volumeName: pvc-768d3f5f-ed7d-4c1f-8206-32d5dd7b2b25
resources:
requests:
storage: 5Gi
i mean what will happen when we increase the replicas number to 3
in normal case, with replicas:3 clickhouse-operator 0.24.x will create separate statefulset + PVC for each new replica
in normal case, with replicas:3 clickhouse-operator 0.24.x will create separate statefulset + PVC for each new replica
so, what we have now
- we scale up replicas from 1 to 3
- operator creates ne PVCs and all meta begins a new live in all replicas
- everybody is happy
am i right?
yes, in normal case new replicas shall start as follower and when quorum achieve, everything will fine
in the worst case you can create keeper from scratch
and restore keeper data from clickhouse-server via SYSTEM RESTORE REPLICA db.table
yes, in normal case new replicas shall start as follower and when quorum achieve
in worst case you can create keeper from scratch and restore keeper data from clickhouse-server via
SYSTEM RESTORE REPLICA db.table
you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?
you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?
in worst case, if you lost keeper data you can create keeper from scrach (delete PV, delete CHK, apply CHK)
and use SYSTEM RESTORE REPLICA db.table for all replicated table
something like that for disaster recovery plan
clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress
you mean that all keeper meta is in cluster, and i have to make SYSTEM RESTORE REPLICA for all my dbs?
in worst case, if you lost keeper data you can create keeper from scrach (delete PV, delete CHK, apply CHK)
and use
SYSTEM RESTORE REPLICA db.tablefor all replicated tablesomething like that for disaster recovery plan
clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress
many thanks
Just finished with the second cluster, at this time a trying to update ch-keeper to 24.12-alpine and then i increased the replica number after update everything went smoothly but after the increase - the quorum was lost and chi went to RO :(
system replica restore - was the cure but only after dettach/attach tables
i got to get logs with thing like
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387386 [ 50 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/ClickHouse/ClickHouse/issues
but i think it should be an another issue to make
I just want to ask (not expert in this field) since I might get into these issues as well.
After updating replicas, did you do the following?
DETACH TABLE table ON CLUSTER '{cluster}'for all tables in the db?ATTACH TABLE table ON CLUSTER '{cluster}'for all tables in the db?clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progress
Would it help to just detach the entire DB instead of each table?
I just want to ask (not expert in this field) since I might get into these issues as well.
After updating replicas, did you do the following?
DETACH TABLE table ON CLUSTER '{cluster}'for all tables in the db?ATTACH TABLE table ON CLUSTER '{cluster}'for all tables in the db?clickhouse-client -q "SELECT concat('SYSTEM RESTORE REPLICA ',database,'.',table,';') FROM system.tables WHERE engine LIKE 'Replicated%' FOMAT TSVRaw" | clickhouse-client -mn --echo --progressWould it help to just detach the entire DB instead of each table?
you just have to detach and attach replicated tables, you can find all off them with smth like this
SELECT
database,
name,
engine
FROM system.tables
WHERE engine LIKE 'Replicated%'
and getting smth like
Query id: 51cec1e2-02ce-4399-90b1-b1cb22a38315
ββdatabaseββ¬βnameβββββββββββ¬βengineβββββββββββββββ
1. β test_db β replica_table β ReplicatedMergeTree β
ββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββββ
1 row in set. Elapsed: 0.004 sec.
so in may case all of them should be attached and detached before SYSTEM RESTORE REPLICA
i made it with:
for table in \
"test_db.replica_table" \
"test_db.replica_table1"; do
clickhouse-client -q "DETACH TABLE $table;" --echo --progress
done
and
for table in \
"test_db.replica_table" \
"test_db.replica_table1"; do
clickhouse-client -q "ATTACH TABLE $table;" --echo --progress
done
most of all because i wanted to drive the way they detached and attached and sometimes when you try to put them in pipe you may get that table is locked because someone use it and then script failes
@Brainpitcher appreciate the detailed explanation π thanks a lot