velero-plugin-for-vsphere icon indicating copy to clipboard operation
velero-plugin-for-vsphere copied to clipboard

failed to restore with error: "Failed to create a CloneFromSnapshot CR"

Open wanghuok opened this issue 3 years ago • 5 comments

Describe the bug

restore failed

To Reproduce

  1. setup tkgi 1.13
  2. create a k8s cluster
  3. install velero and velero-plugin-for-vsphere:v1.3.0-rc4
  4. create 10 namespaces, 100 statefulsets with 1g pv each
  5. successfully to backup 10 namespaces , 2 namespaces each time
  6. delete all 10 namespaces
  7. failed to restore from previous backups:

Expected behavior

restore successfully

Troubleshooting Information

[Please refer to the Troubleshooting page and collect the required information]

found some errors like this by "velero restore logs ": time="2021-11-23T08:11:17Z" level=error msg="Failed to create a CloneFromSnapshot CR: Phase=Failed, err=cloneFromSnapshot: Failed at calling SnapshotManager CreateVolumeFromSnapshotWithMetadata with peId pvc:velero-1g-2/appdb-postgres-wl-12-0, err: Create download cr failed.\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshot.func2\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:701\nk8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:217\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:230\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:657\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:591\nk8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:563\nk8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:552\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshot\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:685\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.DataManagerProtectedEntity.Overwrite\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/data_manager_protected_entity.go:21\ngithub.com/vmware-tanzu/astrolabe/pkg/pvc.(*PVCProtectedEntityTypeManager).CreateFromMetadata\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/pvc/pvc_protected_entity_type_manager.go:227\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshotWithMetadata\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:776\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshot\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller.go:311\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).syncCloneFromSnapshotByKey\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:617\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshotWorker\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:562\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nFailed to get the source snapshot PE for peID ivd:12faa9ba-bb5c-4d31-b860-dd7c58f911a7:963f2a93-d505-434b-a2f8-134075a65177\ngithub.com/vmware-tanzu/astrolabe/pkg/pvc.(*PVCProtectedEntityTypeManager).CreateFromMetadata\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/pvc/pvc_protected_entity_type_manager.go:231\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshotWithMetadata\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:776\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshot\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller.go:311\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).syncCloneFromSnapshotByKey\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:617\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshotWorker\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:562\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nError creating volume from metadata\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshotWithMetadata\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:779\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshot\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller.go:311\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).syncCloneFromSnapshotByKey\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:617\ngithub.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshotWorker\n\t/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:562\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371" cmd=/plugins/velero-plugin-for-vsphere logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/plugin/restore_pvc_action_plugin.go:208" pluginName=velero-plugin-for-vsphere restore=velero/hu-rs-1 it said " Failed to get the source snapshot PE for peID ivd:12faa9ba-bb5c-4d31-b860-dd7c58f911a7:963f2a93-d505-434b-a2f8-134075a65177". but i checked the ivd is exists.

found some erros like this by "velero restore describe --details": error preparing persistentvolumeclaims/velero-1g-2/appdb-postgres-wl-72-0: rpc error: code = Unknown desc = Failed to create a CloneFromSnapshot CR: Phase=Failed, err=cloneFromSnapshot: Failed at calling SnapshotManager CreateVolumeFromSnapshotWithMetadata with peId pvc:velero-1g-2/appdb-postgres-wl-72-0, err: PVC "appdb-postgres-wl-72-0" did not become Bound: PersistentVolumeClaims [appdb-postgres-wl-72-0] not all in phase Bound within 3m0s Error creating volume from metadata github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr.(*SnapshotManager).CreateVolumeFromSnapshotWithMetadata /go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:779 github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshot /go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller.go:311 github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).syncCloneFromSnapshotByKey /go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:617 github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver.(*backupDriverController).cloneFromSnapshotWorker /go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:562 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1371

[If applicable, add screenshots to help explain your problem.]

[Miscellaneous information that will assist in solving the issu hu-rs-1-details.log hu-rs-1.log e]

wanghuok avatar Dec 01 '21 01:12 wanghuok

@wanghuok Also I've encountered this error, I just retry the restore action, the restore is successful without error.

qiuming-best avatar Jan 08 '22 03:01 qiuming-best

Hi dev, any update on this issue?

liyan-su avatar Jan 10 '22 10:01 liyan-su

It appears that there are mainly 2 kinds of errors from the attached logs:

  1. Create download cr failed. This was identified earlier, and fixed in https://github.com/vmware-tanzu/velero-plugin-for-vsphere/pull/434. The fix is available in 1.3.1, would you be willing to try it out?

  2. PVC did not become Bound: PersistentVolumeClaims not all in phase Bound within 3m0s Please provide the csi driver logs for this.

Our test team reported these errors in the past, but did not see the issue in 1.3.1 plugin version. If you do observe the issue please attach the backup-driver and data-manager logs as well.

deepakkinni avatar Jan 10 '22 18:01 deepakkinni

Thanks @deepakkinni reply, we cann't provide backup-driver and data-manager logs due to we don't have the environment yet, we will verify this issue in tkgi1.14 if we can. thanks

liyan-su avatar Jan 10 '22 22:01 liyan-su

It appears that there are mainly 2 kinds of errors from the attached logs:

  1. Create download cr failed. This was identified earlier, and fixed in Retry if there are errors retrieving download CR #434. The fix is available in 1.3.1, would you be willing to try it out?
  2. PVC did not become Bound: PersistentVolumeClaims not all in phase Bound within 3m0s Please provide the csi driver logs for this.

Our test team reported these errors in the past, but did not see the issue in 1.3.1 plugin version. If you do observe the issue please attach the backup-driver and data-manager logs as well.

I also hit the issue, and here is the logs you asked: error msg from data-manager pod logs: time="2022-10-11T09:58:39Z" level=info msg="vCenter connected, proceeding with operation" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/ivd/ivd_protected_entity_type_manager.go:309" 2022-10-11T09:58:39.230Z warning -[00025] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3 2022-10-11T09:58:39.243Z warning -[00013] [Originator@6876 sub=vimaccess] cannot get thumbprint: SSL error code '151441516', exception: 'Wrong X.509 Certificate format' 2022-10-11T09:58:39.570Z warning -[00021] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3 2022-10-11T09:58:39.582Z warning -[00013] [Originator@6876 sub=vimaccess] cannot get thumbprint: SSL error code '151441516', exception: 'Wrong X.509 Certificate format' ... time="2022-10-11T09:58:41Z" level=error msg="Failed at copying to remote repository" Local PEID="ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8" error="Failed to delete peinfo from bucket "velero": Unable to delete object "plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8" from bucket "velero": RequestError: send request failed\ncaused by: Get "https://10.221.138.*:9000/velero?prefix=plugins%2Fvsphere-astrolabe-repo%2Fivd%2Fpeinfo%2Fivd%3Aa1ab225c-b98d-4f38-9200-9f0584cff5c5%3A098881e4-a9f4-4892-80f5-3c638932a0a8": x509: certificate signed by unknown authority" error.file="/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/s3repository/repository_protected_entity.go:183" error.function=github.com/vmware-tanzu/astrolabe/pkg/s3repository.ProtectedEntity.deleteSnapshotComponents logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/dataMover/data_mover.go:120" ...

error msg from backup-driver pod logs:

time="2022-10-11T10:01:59Z" level=error msg="Failed to get the source snapshot PE for peID ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8" error="GetObject failed for bucket velero, key plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8: RequestError: send request failed\ncaused by: Get "https://10.221.138.*:9000/velero/plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd%3Aa1ab225c-b98d-4f38-9200-9f0584cff5c5%3A098881e4-a9f4-4892-80f5-3c638932a0a8": x509: certificate signed by unknown authority" error.file="/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/s3repository/repository_protected_entity_type_manager.go:176" error.function="github.com/vmware-tanzu/astrolabe/pkg/s3repository.(ProtectedEntityTypeManager).GetProtectedEntity" logSource="/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/pvc/pvc_protected_entity_type_manager.go:222" time="2022-10-11T10:01:59Z" level=error msg="Error creating volume from metadata" error="Failed to get the source snapshot PE for peID ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8: GetObject failed for bucket velero, key plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8: RequestError: send request failed\ncaused by: Get "https://10.221.138.:9000/velero/plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd%3Aa1ab225c-b98d-4f38-9200-9f0584cff5c5%3A098881e4-a9f4-4892-80f5-3c638932a0a8": x509: certificate signed by unknown authority" error.file="/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/s3repository/repository_protected_entity_type_manager.go:176" error.function="github.com/vmware-tanzu/astrolabe/pkg/s3repository.(ProtectedEntityTypeManager).GetProtectedEntity" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/snapshotmgr/snapshot_manager.go:778" time="2022-10-11T10:01:59Z" level=error msg="cloneFromSnapshot nginx-example/577b07bd-a3f0-4341-b29e-94060e2fbb23 failed: Error creating volume from metadata: Failed to get the source snapshot PE for peID ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8: GetObject failed for bucket velero, key plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd:a1ab225c-b98d-4f38-9200-9f0584cff5c5:098881e4-a9f4-4892-80f5-3c638932a0a8: RequestError: send request failed\ncaused by: Get "https://10.221.138.:9000/velero/plugins/vsphere-astrolabe-repo/ivd/peinfo/ivd%3Aa1ab225c-b98d-4f38-9200-9f0584cff5c5%3A098881e4-a9f4-4892-80f5-3c638932a0a8": x509: certificate signed by unknown authority" controller=BackupDriverController logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/backupdriver/backup_driver_controller_base.go:619"

jhuisss avatar Oct 11 '22 10:10 jhuisss