solr-operator icon indicating copy to clipboard operation
solr-operator copied to clipboard

Adding Backup/Restore capabilities.

Open HoustonPutman opened this issue 5 years ago • 10 comments

I imagine that this would be implemented as two additional CRDs, SolrCloudBackup and SolrCloudRestore.

We would likely use the Solr Collections API Backup & Restore capability.

The CRDs would take as input:

  • A volume to store/retrieve the backup data in/from
  • An optional list of collections to backup/restore. If not passed, all collections will be done
  • The namespace and name of the solr cloud to backup

Progress and completeness would be conveyed through the Status section of each CRD.

HoustonPutman avatar Jun 21 '19 16:06 HoustonPutman

I see you merged backup capabilities #14 -- Is there any documentation available about usage/how to perform backups? Thanks!

kcmartin avatar Sep 13 '19 22:09 kcmartin

Hey Kristin,

In general we need to work on documentation, but there's a bit in the README on the requirements to make backups work. I also need to cut a release of the operator, so that you can use the backup functionality. I'll try to beef up the documentation over the next few days and also cut a release.

HoustonPutman avatar Sep 16 '19 16:09 HoustonPutman

Great to hear @HoustonPutman !

kcmartin avatar Sep 16 '19 18:09 kcmartin

Are there any updates on restore functionality? @HoustonPutman

kcmartin avatar Oct 28 '19 23:10 kcmartin

Hey @kcmartin, sorry for the delay on this. I know backups are kind of useless without being able to restore. I think I'll be able to put some work into it early next month. Hopefully it doesn't take too long, as the workflow should basically be the opposite of the backup controller.

Have y'all played around with the backup CRDs yet? Or are you waiting on the restore functionality?

HoustonPutman avatar Oct 29 '19 15:10 HoustonPutman

Hi @HoustonPutman, thanks, I think my teammates have experimented with the backup CRDs a bit... My plan is to wait and do more tests once the restore capability is available. :)

kcmartin avatar Oct 29 '19 17:10 kcmartin

@HoustonPutman We use NFS and encountered a "permission denied" issue. The solr user doesn't have permission to write to the NFS shared volume. We added this init container to the statefulset which resolved the issue:

{
          Name:                     "backup",
          Image:                    solrCloud.Spec.BusyBoxImage.ToImageName(),
          ImagePullPolicy:          solrCloud.Spec.BusyBoxImage.PullPolicy,
          TerminationMessagePath:   "/dev/termination-log",
          TerminationMessagePolicy: "File",
          SecurityContext: &corev1.SecurityContext{
                  RunAsUser: &runAsUser,
           },
           Command:                  []string{"sh", "-c", "chown " + strconv.Itoa(int(fsGroup)) + " " + BaseBackupRestorePath},
           VolumeMounts: []corev1.VolumeMount{
                     {
                               Name:      BackupRestoreVolume,
                               MountPath: BaseBackupRestorePath,
                               SubPath: BackupRestoreSubPathForCloud(solrCloud.Name),
                      },
            },
}

and this is in solrcloud.yaml:

backupRestoreVolume:
    nfs:
      path: /
      server: BACKUP_ENDPOINT

It would be great if there was a nicer workaround, but I didn't find any. Does it make sense to add this to the operator?

timterle avatar Dec 10 '19 15:12 timterle

That makes sense to me. Does it still allow the Kube Job to copy the backup to S3 or another volume? We might have to change the runAsUser of that job as well.

HoustonPutman avatar Dec 10 '19 20:12 HoustonPutman

I am curious as to backup and restore options. Is there currently any viable way of backing up and restoring a collection? Either at the application level, or maybe by providing an existing PV (in my case, on AWS, I could create these from EBS snapshots, perhaps).

Is "there is no current way to restore these backups" completely true? Or just that there's no method supported by this project? Does the existing backup functionality make use of https://solr.apache.org/guide/8_1/collections-api.html#backup ? In which case I could handle the restore myself by retrieving the backup (eg from S3) and making a request to the Solr API, perhaps?

I'm really anxious to migrate to this project, rather than using our existing home-grown Solr chart. But for all its faults, it does allow me to quickly restore using EBS snapshots. I would like to understand any potential recovery process that exists now or might exist soon.

plumdog avatar Apr 27 '21 11:04 plumdog

The backup should work, but it is tested less than the other options.

There is no way to restore that is supported by the project. You should still be able to load the backup data into the shared backup/restore PV yourself, then issue the commands to restore via the Collections API. (The operator generates the backup via that Collections API command you linked)

It is possible, and there have been people that have used the backup functionality, then restored via their own methods.

In Solr 9.0, there will be much easier backup/restore functionality that should work natively with S3, and won't require any intervention with the Operator. You will be able to do it entirely via Solr API commands. You can follow that progress here: https://issues.apache.org/jira/browse/SOLR-15086

HoustonPutman avatar Apr 27 '21 18:04 HoustonPutman