clickhouse-operator
clickhouse-operator copied to clipboard
Is taking data backups of pvc with velero possible.
I have a question Will the data backup taken through velero be enough to to restore ? Do i need to take special care to maintain consistencty of data to avoid corruption?
We don't have enough experience with Velero
try to apply following configs in CHI, it could affect performance
spec:
configuration:
files:
users.d/fsync_medata.xml: |-
<clickhouse>
<profiles><default><fsync_metadata>1</fsync_metadata></default></profiles>
</clickhouse>
config.d/merge_tree_fsync.xml: |-
<clickhouse>
<merge_tree>
<fsync_after_insert>1</fsync_after_insert>
<fsync_part_directory>1</fsync_part_directory>
<min_compressed_bytes_to_fsync_after_fetch>1</min_compressed_bytes_to_fsync_after_fetch>
<min_compressed_bytes_to_fsync_after_merge>1</min_compressed_bytes_to_fsync_after_merge>
<min_rows_to_fsync_after_merge>1</min_rows_to_fsync_after_merge>
</merge_tree>
</clickhouse>
users.d/distributed_fsync.xml: |-
<clickhouse>
<profiles><default>
<fsync_after_insert>1</fsync_after_insert>
<fsync_directories>1</fsync_directories>
</default></profiles>
</clickhouse>
Could you notify us and provide manifest for velero, if you will success
I was able to backup and restore using basic velero configuration . But I am not able to find a way to quiesce the database during backup . I need this in order to take consistent backups which are not affected by write operations which happen at the same time I take the backup.
@manishtradebyte
But I am not able to find a way to quiesce the database during backup .
You could to try
SYSTEM STOP MERGES
SYSTEM STOP REPLICATION FETCHES
and Detach all engine=Kafka , engine=Nats and engine=RabbitMQ tables and Attach back and
SYSTEM START MERGES
SYSTEM START REPLICATION FETCHES
when backup complete but it would have some side effects like replication lag
what do you mean by detach engine?
I dont use any of these tables engine=Kafka , engine=Nats and engine=RabbitMQ tables Do i need to detach ReplicatedMergeTree and Distributed tables?
also is it
SYSTEM START REPLICATION FETCHES
or
SYSTEM START FETCHES
i mean execute DETACH/ATTACH TABLE db.kafka_table for stop background kafka and nats, and rabbitmq
i don't know about something like SYSTEM STOP MESSAGING BROKER
I tried to take the backup of pvc's using velero for
- clickhouse
- zookeeper
Deleted everything and restored the pvc's from backup and the deployed zookeeper and clickhouse (chi)
Every thing seems to work fine but when i drop database from the restored cluster the replica paths of tables from zookeeper seems to not get deleted. This leads to to a error when ii try to recreate the same table again.
@manishtradebyte did you use DROP DATABASE db SYNC ?
No.. when should i run this after restoring backup?
How exactly did you make "Deleted everything"?
basicallu deleted the cluster and pvc's removed the zookeeper deployment and its pvc's.
@manishtradebyte
thereafter, did you just restore PVC+zk manifests+chlickhouse manifests with velero or just restore PVC with velero and re-deploy manifests manually?
the just restored the pvc's using velero fro noth zk and chi.
then i deployed the manifests manually
In this case, clickhouse-operator will try to restore schema during restoration, but this is weird, why you rececive replica path already exists, cause /var/lib/clickhouse/metadata should be mount from PVC.
Could you share your clickhouse pod generated manifest in yaml format?
kubectl get pod -n <your-ns> pod-name-0-0-0 -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-03-22T10:11:30Z"
generateName: chi-clickhouse-cluster_name-0-0-
labels:
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: clickhouse
clickhouse.altinity.com/cluster: cluster_name
clickhouse.altinity.com/namespace: clickhouse-backup
clickhouse.altinity.com/ready: "yes"
clickhouse.altinity.com/replica: "0"
clickhouse.altinity.com/shard: "0"
controller-revision-hash: chi-clickhouse-cluster_name-0-0-55dfd6875
statefulset.kubernetes.io/pod-name: chi-clickhouse-cluster_name-0-0-0
name: chi-clickhouse-cluster_name-0-0-0
namespace: clickhouse-backup
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: chi-clickhouse-cluster_name-0-0
uid: 3c1d2074-6241-44c7-b3f2-db7b8e5e5bd1
resourceVersion: "170085224"
uid: 23d437b1-cc7a-4aa8-8cd5-5e6b6984a1fa
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-central-1a
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
clickhouse.altinity.com/app: chop
clickhouse.altinity.com/chi: clickhouse
clickhouse.altinity.com/namespace: clickhouse-backup
topologyKey: kubernetes.io/hostname
containers:
- image: clickhouse/clickhouse-server:24.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: clickhouse-pod
ports:
- containerPort: 9000
name: tcp
protocol: TCP
- containerPort: 8123
name: http
protocol: TCP
- containerPort: 9009
name: interserver
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ping
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "2"
memory: 6Gi
requests:
cpu: "1"
memory: 4Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/clickhouse-server/config.d/
name: chi-clickhouse-common-configd
- mountPath: /etc/clickhouse-server/users.d/
name: chi-clickhouse-common-usersd
- mountPath: /etc/clickhouse-server/conf.d/
name: chi-clickhouse-deploy-confd-cluster_name-0-0
- mountPath: /var/lib/clickhouse
name: default
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-c6bp4
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostAliases:
- hostnames:
- chi-clickhouse-cluster_name-0-0
ip: 127.0.0.1
hostname: chi-clickhouse-cluster_name-0-0-0
nodeName: ip-10-64-195-208.eu-central-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
subdomain: chi-clickhouse-cluster_name-0-0
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default
persistentVolumeClaim:
claimName: default-chi-clickhouse-cluster_name-0-0-0
- configMap:
defaultMode: 420
name: chi-clickhouse-common-configd
name: chi-clickhouse-common-configd
- configMap:
defaultMode: 420
name: chi-clickhouse-common-usersd
name: chi-clickhouse-common-usersd
- configMap:
defaultMode: 420
name: chi-clickhouse-deploy-confd-cluster_name-0-0
name: chi-clickhouse-deploy-confd-cluster_name-0-0
- name: kube-api-access-c6bp4
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2024-03-22T10:12:33Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2024-03-22T10:13:05Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2024-03-22T10:13:05Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2024-03-22T10:12:33Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://a992fe1658c93fb0972d3577b613bc1a3cc324008d5fd291726ce0383f25fb0f
image: docker.io/clickhouse/clickhouse-server:24.1
imageID: docker.io/clickhouse/clickhouse-server@sha256:7029f00d469e0d5d32f6c2dd3c5fd9110344b5902b4401c05da705a321e3fc86
lastState: {}
name: clickhouse-pod
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2024-03-22T10:12:53Z"
hostIP: 10.64.195.208
phase: Running
podIP: 10.64.195.158
podIPs:
- ip: 10.64.195.158
qosClass: Burstable
startTime: "2024-03-22T10:12:33Z"
- mountPath: /var/lib/clickhouse
name: default
/var/lib/clickhouse/metadata/ should attach exists tables during apply manifests
could you share?
kubectl describe chi -n clickhouse-backup clickhouse
Where is your operator installed?
kubectl get deployment --all-namespaces | grep clickhouse-operator
operator is installed in the same namespace.
I tried to apply restore again and it seems to work. The tables a created and when i drop them i can recreate them as well
you can close this issue.if you want.
also it would be great if this gets resolved https://github.com/Altinity/clickhouse-backup/issues/860. I am only using velero because clickhouse-backup doesnt work
Thanks a lot