k8ssandra icon indicating copy to clipboard operation
k8ssandra copied to clipboard

K8SSAND-1306 ⁃ CassandraRestore manifest does not restore from a gcs backup into another k8ssandra cluster. Fails to identify the backup reference.

Open alokhom opened this issue 2 years ago • 2 comments

Bug Report

I am trying to restore k8ssandra cluster from a multiTenant GCS backup into a aks k8ssandra cluster by using CassandraRestore manifest and it is failing. I am providing a valid CassandraBackup backup reference ID visible in the GCS.

  1. The backup was done from gke medusa cassandra backup (successful on GCS).
  2. The cluster design on both the source and target k8ssandra clusters are same and only cluster names are different.

Describe the bug

My Setup:

  1. I have one k8ssandra cluster on GKE with medusa installed to backup on GCS. That is successful with a CronJob. (k8ssandra Helm Chart version 1.4.1)
  2. I have another k8ssandra restore cluster on AKS with medusa and the idea is to restore the backup from GCS. It connects to the GCS storage successfully as per medusa container logs.(k8ssandra Helm Chart version 1.5.1-snapshot latest ) https://github.com/k8ssandra/k8ssandra/blob/1c066b9ff0b512ee6e98e1bb0fd7b58cb52fe5b1/charts/k8ssandra/Chart.yaml#L6)
  3. Both the clusters have same cassandra yaml manifest used for deployment and the k8ssandra cluster name is only different. I have attached all manifest and references.
    cassandra:
      cassandraLibDirVolume:
        storageClass: managed-premium
        size: 100Gi
      heap:
        size: 1G
        newGenSize: 1G
      resources:
        requests:
          cpu: 1
          memory: 2Gi
        limits:
          cpu: 1
          memory: 3Gi
      clusterName: restore-cluster
      auth:
        enabled: false
      datacenters:
        - name: dc1
          size: 3
          racks:
            - name: rack1
            - name: rack2
            - name: rack3
    stargate:
      enabled: false

    medusa:
      enabled: true
      multiTenant: true
      storage: google_storage
      bucketName: gcloudstagingXXXXXXXXXXXXX
      storageSecret: storage-s3-json
  1. The CassandraRestore fires and medusa operator logs shows the error. It is pointing to the aks cluster Cassandra restore-cluster
    apiVersion: cassandra.k8ssandra.io/v1alpha1
    kind: CassandraRestore
    metadata:
      name: medusa-restoredaily-timestamp
      namespace: cassandra
    spec:
      backup: medusa-daily-timestamp
      inPlace: true
      shutdown: false
      cassandraDatacenter:
        name: dc1
        clusterName: restore-cluster
export backupTimestamp="20220301155128";
cat cassandra_restore.yaml | sed s/timestamp/"$backupTimestamp"/g | kubectl apply -f -;

medusa operator logs

2022-03-02T11:33:35.321Z	ERROR	controller-runtime.manager.controller.cassandrarestore	Reconciler error	{"reconciler group": "cassandra.k8ssandra.io", "reconciler kind": "CassandraRestore", "name": "medusa-restoredaily-20220301155128", "namespace": "cassandra", "error": "CassandraBackup.cassandra.k8ssandra.io \"medusa-daily-20220301155128\" not found"}

To Reproduce Steps to reproduce the behavior:

  1. Fire CassandraBackup manifest from gke cluster to GCS. logs attached of the operator, medusa containers of 3 rack pods, standard k8ssandra helm chart deployed. logs attached.
  2. Fire CassandraRestore manifest on aks cluster pointing to GCS backup. logs attached. logs.

Expected behavior The medusa container logs of restore k8ssandra cluster show that they connect and listen successfully. The CassandraRestore is pointing to the right backup ID of the CassrandraBackup and must restore it.

Screenshots backup in GCS store. image

Environment

  • Helm charts version info -backing up cluster : gke k8ssandra cluster - k8ssandra Helm Chart version 1.4.1 -restoring cluster: aks k8ssandra cluster - k8ssandra Helm Chart version 1.5.0-snapshot
  • Helm charts user-supplied values Shared above helm values.yaml. Its the same yaml but cluster name for gke cluster is main-cluster and aks cluster name is restore-cluster.
  • Kubernetes version information:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9", GitCommit:"37f338aa38e0427e127162afe462e2f4150f0ba3", GitTreeState:"clean", BuildDate:"2022-02-07T20:49:26Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

$ helm version
version.BuildInfo{Version:"v3.7.1", GitCommit:"1d11fcb5d3f3bf00dbe6fe31b8412839a96b3dc4", GitTreeState:"clean", GoVersion:"go1.16.9"}
  • Kubernetes cluster kind: we are using gke and aks clusters

Additional context attached logs.

┆Issue is synchronized with this Jira Task by Unito ┆friendlyId: K8SSAND-1306 ┆priority: Medium

alokhom avatar Mar 02 '22 12:03 alokhom

Hey @alokhom restoring one cluster from another's backup isn't currently supported. Remote restore is a project that we're currently working on though.

You can checkout some of that project here if you're interested:

https://k8ssandra.atlassian.net/browse/K8SSAND-515

jdonenine avatar Mar 02 '22 12:03 jdonenine

ok. When are you expecting to release the feature of remote restoration?

Currently i have tested only same cluster backups/restores are only possible. But that leads to a cloud lockin. Do you have any ideas for the same how to do away with the cloud lockin using k8ssandra?

In the k8ssandra what are the types of restores possible like :

  1. can i do a gke to gke ( or aks to aks) restore different k8ssandra clusters with same yaml manifest but different cluster names?
  2. can i do a gke to gke ( or aks to aks) restore different k8ssandra clusters with same yaml manifest and same cluster names?

alokhom avatar Mar 02 '22 12:03 alokhom