stash icon indicating copy to clipboard operation
stash copied to clipboard

failed to complete restore [..] no snapshot found

Open sgielen opened this issue 3 years ago • 10 comments

I am trying to validate my Stash backups by performing a restore to a different namespace. However, I am encountering the "no snapshot found" error during restore.

First, I performed a backup to Backblaze B2, and the snapshot seems to have been created successfully: the BackupSession is successful; in the bucket, /stash/gitea/snapshots/<long filename> exists; the /bin/restic check in the backup container logs indicates no errors were found. Also, SNAPSHOT-COUNT is 1:

$ kubectl get repository -n stash gitea-to-b2
NAME          INTEGRITY   SIZE         SNAPSHOT-COUNT   LAST-SUCCESSFUL-BACKUP   AGE
gitea-to-b2   true        11.254 MiB   1                101m                     106m

So, I created a new namespace gitea-restore and created the PVC, an Ingress with a different name, a RestoreSession according to https://stash.run/docs/v2022.07.09/guides/use-cases/cross-cluster-backup/, and the StatefulSet itself. The RestoreSession contains the same Repository as the BackupConfiguration; its YAML is below. Indeed, Stash created an init container in the StatefulSet, whose logs contain:

I0926 10:38:05.508833       1 restore.go:116] Got leadership, preparing for restore
I0926 10:38:05.572293       1 commands.go:233] Restoring backed up data
[golang-sh]$ /bin/restic restore latest --path /data --host host-0 --target / --cache-dir /tmp/restic-cache
latest snapshot for criteria not found: no snapshot found Paths:[/data] Hosts:[host-0]
I0926 10:38:10.950152       1 status.go:192] Updating hosts status for restore target StatefulSet gitea-restore/gitea.
F0926 10:38:11.092775       1 restore.go:123] failed to complete restore. Reason: latest snapshot for criteria not found: no snapshot found Paths:[/data] Hosts:[host-0]
[......goroutine stacks....]
I0926 10:38:11.092905       1 restore.go:129] Lost leadership
I0926 10:38:11.096042       1 restore.go:98] Restore completed successfully for RestoreSession gitea-restore/restore
I0926 10:38:11.096111       1 main.go:45] Exiting Stash Main

Two things jump out to me. First of all: the init container failed to restore, but it still said "restore completed successfully" and exited 0. So, now, my Gitea container is running without data, while I had expected the init container to fail and further setup of the application to block.

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 26 Sep 2022 12:38:04 +0200
      Finished:     Mon, 26 Sep 2022 12:38:11 +0200

But, worse still, the restore itself failed. With the RestoreSession pointing at the same Repository as the BackupSession (gitea-to-b2 in the namespace stash), I would have expected it to find the exact same backups in the same directory in the same bucket, but it finds no snapshots. I tried to take a look at the contents of the snapshot, but it's binary gibberish so I don't know what's in it. Could you help me figure out why this is / what's going on?

Here's the RestoreSession YAML:

apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
  name: restore
  namespace: gitea-restore
spec:
  repository:
    name: gitea-to-b2
    namespace: stash
  target: # target indicates where the recovered data will be stored
    ref:
      apiVersion: apps/v1
      kind: StatefulSet
      name: gitea
    volumeMounts:
    - mountPath: /data
      name: data
    rules:
    - paths:
      - /data

sgielen avatar Sep 26 '22 11:09 sgielen

Hello @sgielen! Can you show the yaml of the the BackupConfiguration?

hmsayem avatar Sep 26 '22 11:09 hmsayem

Yes, here it is:

$ kubectl get backupconfiguration -n gitea -o yaml backup
apiVersion: stash.appscode.com/v1beta1
kind: BackupConfiguration
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"stash.appscode.com/v1beta1","kind":"BackupConfiguration","metadata":{"annotations":{},"name":"backup","namespace":"gitea"},"spec":{"backupHistoryLimit":3,"driver":"Restic","paused":false,"repository":{"name":"gitea-to-b2","namespace":"stash"},"retentionPolicy":{"keepLast":2,"name":"keep-last-2","prune":true},"runtimeSettings":{"container":{"securityContext":{"runAsGroup":0,"runAsUser":0}}},"schedule":"16 2 * * *","target":{"alias":"gitea","paths":["/data"],"ref":{"apiVersion":"apps/v1","kind":"StatefulSet","name":"gitea"},"volumeMounts":[{"mountPath":"/data","name":"data"}]},"tempDir":{"medium":"Memory","sizeLimit":"256Mi"},"timeOut":"1h"}}
  creationTimestamp: "2022-09-26T09:11:59Z"
  finalizers:
  - stash.appscode.com
  generation: 1
  name: backup
  namespace: gitea
  resourceVersion: "1245525"
  uid: adcd0ed8-99f4-4335-bc71-9b1f2b0aeb07
spec:
  backupHistoryLimit: 3
  driver: Restic
  paused: false
  repository:
    name: gitea-to-b2
    namespace: stash
  retentionPolicy:
    keepLast: 2
    name: keep-last-2
    prune: true
  runtimeSettings:
    container:
      securityContext:
        runAsGroup: 0
        runAsUser: 0
  schedule: 16 2 * * *
  target:
    alias: gitea
    paths:
    - /data
    ref:
      apiVersion: apps/v1
      kind: StatefulSet
      name: gitea
    volumeMounts:
    - mountPath: /data
      name: data
  tempDir:
    medium: Memory
    sizeLimit: 256Mi
  timeOut: 1h
status:
  conditions:
  - lastTransitionTime: "2022-09-26T09:12:00Z"
    message: Repository stash/gitea-to-b2 exist.
    reason: RepositoryAvailable
    status: "True"
    type: RepositoryFound
  - lastTransitionTime: "2022-09-26T09:12:00Z"
    message: Backend Secret stash/b2-secret exist.
    reason: BackendSecretAvailable
    status: "True"
    type: BackendSecretFound
  - lastTransitionTime: "2022-09-26T09:12:00Z"
    message: Successfully validated.
    reason: ResourceValidationPassed
    status: "True"
    type: ValidationPassed
  - lastTransitionTime: "2022-09-26T09:12:01Z"
    message: Backup target StatefulSet gitea/gitea found.
    reason: TargetAvailable
    status: "True"
    type: BackupTargetFound
  - lastTransitionTime: "2022-09-26T09:12:01Z"
    message: Successfully created backup triggering CronJob.
    reason: CronJobCreationSucceeded
    status: "True"
    type: CronJobCreated
  - lastTransitionTime: "2022-09-26T09:12:01Z"
    message: Successfully injected stash sidecar into StatefulSet gitea/gitea
    reason: SidecarInjectionSucceeded
    status: "True"
    type: StashSidecarInjected
  observedGeneration: 1
  phase: Ready

sgielen avatar Sep 26 '22 11:09 sgielen

what are the mount paths provided in the backup and restore volumes? are they same?

hmsayem avatar Sep 26 '22 12:09 hmsayem

Could you please provide some information about your Snapshot?

To list your Snapshots, run the following command: kubectl get snapshots -n backup

Stash should return the successful Snapshots in the backup namespace after executing the command.

Please provide a YAML of a Snapshot from the list. To get the snapshot YAML, run the command: kubectl get snapshots -n backup <snapshot-name> -o yaml

piyush1146115 avatar Sep 26 '22 12:09 piyush1146115

what are the mount paths provided in the backup and restore volumes? are they same?

I think so; RestoreSession:

  target:
    ref:
      apiVersion: apps/v1
      kind: StatefulSet
      name: gitea
    rules:
    - paths:
      - /data
    volumeMounts:
    - mountPath: /data
      name: data

BackupConfiguration:

  target:
    alias: gitea
    paths:
    - /data
    ref:
      apiVersion: apps/v1
      kind: StatefulSet
      name: gitea
    volumeMounts:
    - mountPath: /data
      name: data

sgielen avatar Sep 26 '22 13:09 sgielen

Could you please provide some information about your Snapshot?

To list your Snapshots, run the following command: kubectl get snapshots -n backup

I have no snapshots in either gitea nor gitea-restore namespace:

$ kubectl get snapshots -n gitea-restore
No resources found in gitea-restore namespace.
$ kubectl get snapshots -n gitea
No resources found in gitea namespace.

Interestingly, kubectl get snapshots -n stash ~~seems to hang forever~~ eventually returned:

NAME                   ID         REPOSITORY    HOSTNAME   CREATED AT
[6 backups for another namespace]
gitea-to-b2-896834ac   896834ac   gitea-to-b2   gitea-0    2022-09-26T09:17:25Z

sgielen avatar Sep 26 '22 13:09 sgielen

One thing that also jumps out to me is that the backup host differs between backup and restore:

BackupSession status.targets[0] contains:

    stats:
    - duration: 19.145936467s
      hostname: gitea-0
      phase: Succeeded
      snapshots:
      - fileStats:
          modifiedFiles: 0
          newFiles: 1650
          totalFiles: 1650
          unmodifiedFiles: 0
        name: 896834ac
        path: /data
        processingTime: "0:14"
        totalSize: 10.776 MiB
        uploaded: 11.254 MiB
    totalHosts: 1

This mentions host gitea-0, while the restore mentions no backups found for host host-0. Could this be related?

sgielen avatar Sep 26 '22 14:09 sgielen

Yes. You are right. You have used alias during backup. You have to use the same alias as source host during restore. Our documentation should have been more clear about this.

Try this RestoreSession.

apiVersion: stash.appscode.com/v1beta1
kind: RestoreSession
metadata:
  name: restore
  namespace: gitea-restore
spec:
  repository:
    name: gitea-to-b2
    namespace: stash
  target: # target indicates where the recovered data will be stored
    ref:
      apiVersion: apps/v1
      kind: StatefulSet
      name: gitea
    volumeMounts:
    - mountPath: /data
      name: data
    rules:
    - paths:
      - /data
      sourceHost: gitea

hossainemruz avatar Sep 26 '22 14:09 hossainemruz

With sourceHost: gitea-0 (not gitea) it works indeed!

I0926 14:47:32.345827       1 commands.go:412] sh-output: restoring <Snapshot 896834ac of [/data] at 2022-09-26 09:17:25.215975215 +0000 UTC by root@gitea-0> to /

Would you recommend not using an alias in BackupConfiguration? I don't know exactly where I got it from, but it was from somewhere in documentation.

So this issue #1484 is about two topics:

(1) alias / sourceHost configuration documentation

(2) if restore fails within the init container, it prints an error and a stack trace, but it still exits 0 with "restore completed successfully" which is not expected behavior.

Please let me know if you would prefer me filing a separate issue for either.

sgielen avatar Sep 26 '22 14:09 sgielen

alias is supposed to be used with BatchBackup where you can backup multiple targets into the same Repository. So, an identifier is necessary to separate their data.

(2) if restore fails within the init container, it prints an error and a stack trace, but it still exits 0 with "restore completed successfully" which is not expected behavior.

Unfortunately, we can't detect if restore in this scenario failed or not. The restore process has found no data in the backend for the intended target. Hence, it restored nothing. It assumed that it has restored all data.

hossainemruz avatar Sep 26 '22 15:09 hossainemruz