clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Is taking data backups of pvc with velero possible.

Open manishtradebyte opened this issue 1 year ago • 16 comments

I have a question Will the data backup taken through velero be enough to to restore ? Do i need to take special care to maintain consistencty of data to avoid corruption?

manishtradebyte avatar Feb 29 '24 09:02 manishtradebyte

We don't have enough experience with Velero

try to apply following configs in CHI, it could affect performance

spec:
  configuration:
    files:
      users.d/fsync_medata.xml: |-
      <clickhouse>
        <profiles><default><fsync_metadata>1</fsync_metadata></default></profiles>
      </clickhouse>
     config.d/merge_tree_fsync.xml: |-
     <clickhouse>
       <merge_tree>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_part_directory>1</fsync_part_directory>
         <min_compressed_bytes_to_fsync_after_fetch>1</min_compressed_bytes_to_fsync_after_fetch>
         <min_compressed_bytes_to_fsync_after_merge>1</min_compressed_bytes_to_fsync_after_merge>
         <min_rows_to_fsync_after_merge>1</min_rows_to_fsync_after_merge>
       </merge_tree> 
     </clickhouse>
     users.d/distributed_fsync.xml: |-
     <clickhouse>
       <profiles><default>
         <fsync_after_insert>1</fsync_after_insert>
         <fsync_directories>1</fsync_directories>
       </default></profiles>
     </clickhouse>

Could you notify us and provide manifest for velero, if you will success

Slach avatar Feb 29 '24 10:02 Slach

I was able to backup and restore using basic velero configuration . But I am not able to find a way to quiesce the database during backup . I need this in order to take consistent backups which are not affected by write operations which happen at the same time I take the backup.

manishtradebyte avatar Mar 03 '24 17:03 manishtradebyte

@manishtradebyte

But I am not able to find a way to quiesce the database during backup .

You could to try

SYSTEM STOP MERGES
SYSTEM STOP REPLICATION FETCHES

and Detach all engine=Kafka , engine=Nats and engine=RabbitMQ tables and Attach back and

SYSTEM START MERGES
SYSTEM START REPLICATION FETCHES

when backup complete but it would have some side effects like replication lag

Slach avatar Mar 03 '24 17:03 Slach

what do you mean by detach engine?

I dont use any of these tables engine=Kafka , engine=Nats and engine=RabbitMQ tables Do i need to detach ReplicatedMergeTree and Distributed tables?

also is it

SYSTEM START REPLICATION FETCHES
or
SYSTEM START  FETCHES

manishtradebyte avatar Mar 04 '24 23:03 manishtradebyte

i mean execute DETACH/ATTACH TABLE db.kafka_table for stop background kafka and nats, and rabbitmq

i don't know about something like SYSTEM STOP MESSAGING BROKER

Slach avatar Mar 05 '24 04:03 Slach

I tried to take the backup of pvc's using velero for

  1. clickhouse
  2. zookeeper

Deleted everything and restored the pvc's from backup and the deployed zookeeper and clickhouse (chi)

Every thing seems to work fine but when i drop database from the restored cluster the replica paths of tables from zookeeper seems to not get deleted. This leads to to a error when ii try to recreate the same table again.

manishtradebyte avatar Mar 21 '24 14:03 manishtradebyte

@manishtradebyte did you use DROP DATABASE db SYNC ?

Slach avatar Mar 22 '24 07:03 Slach

No.. when should i run this after restoring backup?

manishtradebyte avatar Mar 22 '24 08:03 manishtradebyte

How exactly did you make "Deleted everything"?

Slach avatar Mar 22 '24 08:03 Slach

basicallu deleted the cluster and pvc's removed the zookeeper deployment and its pvc's.

manishtradebyte avatar Mar 22 '24 09:03 manishtradebyte

@manishtradebyte thereafter, did you just restore PVC+zk manifests+chlickhouse manifests with velero or just restore PVC with velero and re-deploy manifests manually?

Slach avatar Mar 22 '24 10:03 Slach

the just restored the pvc's using velero fro noth zk and chi.

then i deployed the manifests manually

manishtradebyte avatar Mar 22 '24 10:03 manishtradebyte

In this case, clickhouse-operator will try to restore schema during restoration, but this is weird, why you rececive replica path already exists, cause /var/lib/clickhouse/metadata should be mount from PVC.

Could you share your clickhouse pod generated manifest in yaml format?

kubectl get pod -n <your-ns> pod-name-0-0-0 -o yaml

Slach avatar Mar 22 '24 10:03 Slach

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-03-22T10:11:30Z"
  generateName: chi-clickhouse-cluster_name-0-0-
  labels:
    clickhouse.altinity.com/app: chop
    clickhouse.altinity.com/chi: clickhouse
    clickhouse.altinity.com/cluster: cluster_name
    clickhouse.altinity.com/namespace: clickhouse-backup
    clickhouse.altinity.com/ready: "yes"
    clickhouse.altinity.com/replica: "0"
    clickhouse.altinity.com/shard: "0"
    controller-revision-hash: chi-clickhouse-cluster_name-0-0-55dfd6875
    statefulset.kubernetes.io/pod-name: chi-clickhouse-cluster_name-0-0-0
  name: chi-clickhouse-cluster_name-0-0-0
  namespace: clickhouse-backup
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: chi-clickhouse-cluster_name-0-0
    uid: 3c1d2074-6241-44c7-b3f2-db7b8e5e5bd1
  resourceVersion: "170085224"
  uid: 23d437b1-cc7a-4aa8-8cd5-5e6b6984a1fa
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - eu-central-1a
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            clickhouse.altinity.com/app: chop
            clickhouse.altinity.com/chi: clickhouse
            clickhouse.altinity.com/namespace: clickhouse-backup
        topologyKey: kubernetes.io/hostname
  containers:
  - image: clickhouse/clickhouse-server:24.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 10
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: clickhouse-pod
    ports:
    - containerPort: 9000
      name: tcp
      protocol: TCP
    - containerPort: 8123
      name: http
      protocol: TCP
    - containerPort: 9009
      name: interserver
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "2"
        memory: 6Gi
      requests:
        cpu: "1"
        memory: 4Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/clickhouse-server/config.d/
      name: chi-clickhouse-common-configd
    - mountPath: /etc/clickhouse-server/users.d/
      name: chi-clickhouse-common-usersd
    - mountPath: /etc/clickhouse-server/conf.d/
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    - mountPath: /var/lib/clickhouse
      name: default
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-c6bp4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostAliases:
  - hostnames:
    - chi-clickhouse-cluster_name-0-0
    ip: 127.0.0.1
  hostname: chi-clickhouse-cluster_name-0-0-0
  nodeName: ip-10-64-195-208.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: chi-clickhouse-cluster_name-0-0
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default
    persistentVolumeClaim:
      claimName: default-chi-clickhouse-cluster_name-0-0-0
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-configd
    name: chi-clickhouse-common-configd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-common-usersd
    name: chi-clickhouse-common-usersd
  - configMap:
      defaultMode: 420
      name: chi-clickhouse-deploy-confd-cluster_name-0-0
    name: chi-clickhouse-deploy-confd-cluster_name-0-0
  - name: kube-api-access-c6bp4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:13:05Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-03-22T10:12:33Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://a992fe1658c93fb0972d3577b613bc1a3cc324008d5fd291726ce0383f25fb0f
    image: docker.io/clickhouse/clickhouse-server:24.1
    imageID: docker.io/clickhouse/clickhouse-server@sha256:7029f00d469e0d5d32f6c2dd3c5fd9110344b5902b4401c05da705a321e3fc86
    lastState: {}
    name: clickhouse-pod
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-03-22T10:12:53Z"
  hostIP: 10.64.195.208
  phase: Running
  podIP: 10.64.195.158
  podIPs:
  - ip: 10.64.195.158
  qosClass: Burstable
  startTime: "2024-03-22T10:12:33Z"

manishtradebyte avatar Mar 22 '24 11:03 manishtradebyte

  - mountPath: /var/lib/clickhouse
      name: default

/var/lib/clickhouse/metadata/ should attach exists tables during apply manifests

could you share?

kubectl describe chi -n clickhouse-backup clickhouse

Where is your operator installed?

kubectl get deployment --all-namespaces | grep clickhouse-operator

Slach avatar Mar 22 '24 11:03 Slach

operator is installed in the same namespace.

I tried to apply restore again and it seems to work. The tables a created and when i drop them i can recreate them as well

you can close this issue.if you want.

also it would be great if this gets resolved https://github.com/Altinity/clickhouse-backup/issues/860. I am only using velero because clickhouse-backup doesnt work

Thanks a lot

manishtradebyte avatar Mar 22 '24 13:03 manishtradebyte