velero-plugin-for-vsphere
velero-plugin-for-vsphere copied to clipboard
velero restore create stuck in status "InProgress" when restore pv and pvc only
Describe the bug
"velero restore create" cmd is not working when using flag “--include-resources pv,pvc”, it stucked in status “InProgress”
To Reproduce
First I succeeded to create a backup of the whole workload cluster with pv, see:
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-de661d9c-40d3-483e-a106-a0c2a3c06eea 50Mi RWO Delete Bound nginx-example/nginx-logs vsphere-csi 4m33s
# kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-example nginx-logs Bound pvc-de661d9c-40d3-483e-a106-a0c2a3c06eea 50Mi RWO vsphere-csi 4m39s
# velero backup create whole-wc
Backup request "whole-wc" submitted successfully.
Run `velero backup describe whole-wc` or `velero backup logs whole-wc` for more details.
# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
whole-wc Completed 0 1 2022-10-28 06:56:35 +0000 UTC 29d default <none>
Then I manually deleted the pv and pvc, to test velero restore with flag “--include-resources pv,pvc”, but it is stuck in "InProgress":
# velero restore create --from-backup whole-wc --include-resources pv,pvc --existing-resource-policy updateRestore request "whole-wc-20221028072733" submitted successfully.
Run `velero restore describe whole-wc-20221028072733` or `velero restore logs whole-wc-20221028072733` for more details.
# velero restore describe whole-wc-20221028072733
Name: whole-wc-20221028072733
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: InProgress
Estimated total items to be restored: 2
Items restored so far: 1
Started: 2022-10-28 07:27:33 +0000 UTC
Completed: <n/a>
Backup: whole-wc
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: pv, pvc
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Existing Resource Policy: update
Preserve Service NodePorts: auto
# kubectl get download download-8b45b2e8-d645-4cdc-a1ac-4e808340cd7c-7a1fd63f-d2f1-4b45-8e3d-e9ef1439ec7e -n velero -o yaml
apiVersion: datamover.cnsdp.vmware.com/v1alpha1
kind: Download
metadata:
creationTimestamp: "2022-10-28T07:27:05Z"
generation: 5
labels:
velero.io/exclude-from-backup: "true"
name: download-8b45b2e8-d645-4cdc-a1ac-4e808340cd7c-7a1fd63f-d2f1-4b45-8e3d-e9ef1439ec7e
namespace: velero
resourceVersion: "11013595"
uid: 48515084-d7cb-4cce-aa94-955d02b56702
spec:
backupRepositoryName: br-20760287-6db0-4d0f-bf4f-9f5ce1f2b239
clonefromSnapshotReference: nginx-example/485ef262-c458-422e-82c9-0aa64555e21d
protectedEntityID: ivd:4c21622c-d402-43b3-baad-da34d9447f68
restoreTimestamp: "2022-10-28T07:27:36Z"
snapshotID: ivd:29d626f0-371c-448f-8de5-99ad961a258d:8b45b2e8-d645-4cdc-a1ac-4e808340cd7c
status:
message: 'Failed to download snapshot, ivd:29d626f0-371c-448f-8de5-99ad961a258d:8b45b2e8-d645-4cdc-a1ac-4e808340cd7c,
from durable object storage. Data copy failed from PE ivd:4c21622c-d402-43b3-baad-da34d9447f68,
to PE ivd:29d626f0-371c-448f-8de5-99ad961a258d:8b45b2e8-d645-4cdc-a1ac-4e808340cd7c:
Open virtual disk file failed. The error code is 1. with error code: 1'
nextRetryTimestamp: "2022-10-28T07:32:41Z"
phase: Retry
processingNode: np1-76b686bbf5-lntdp
progress: {}
retryCount: 2
startTimestamp: "2022-10-28T07:27:36Z"
Expected behavior
velero restore create succeed and the pv and pvc restore back.
Troubleshooting Information Velero server version: 1.9.2 AWS plugin version: v1.5.1 vSphere plugin version: v1.4.0 Kubernetes: Vanilla (tkgm workload cluster) Kubernetes version:
# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8+vmware.2", GitCommit:"d625665700d901401d7504761cf473dd8adfc7b3", GitTreeState:"clean", BuildDate:"2022-07-18T23:04:08Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
Please attach logs, see instructions at https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#logs
Please attach logs, see instructions at https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#logs bundle-2022-11-02-11-54-06.tar.gz Hi I have attached the whole log file, with cmd "velero debug --restore test-pv-pvc-20221102114058"
What is the "velero backup" command you used? After backup is complete, did you check the status of "Snapshot" and "Upload"? https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/vanilla.md#backup-vsphere-cns-block-volumes
velero backup create whole-wc
I used cmd "$ velero backup create whole-wc" to backup the whole workload cluster.
$ velero backup create whole-wc
Backup request "whole-wc" submitted successfully.
Run `velero backup describe whole-wc` or `velero backup logs whole-wc` for more details.
$ velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
whole-wc Completed 0 1 2022-11-02 15:58:28 +0000 UTC 29d default <none>
$ kubectl get snapshot -A
NAMESPACE NAME AGE
nginx-example snap-70838ef6-80ce-4a0c-b8ec-4e9e49a3116d 44s
$ kubectl get snapshot -n nginx-example snap-70838ef6-80ce-4a0c-b8ec-4e9e49a3116d -o yaml
apiVersion: backupdriver.cnsdp.vmware.com/v1alpha1
kind: Snapshot
metadata:
creationTimestamp: "2022-11-02T15:58:44Z"
generation: 1
labels:
velero.io/backup-name: whole-wc
velero.io/exclude-from-backup: "true"
name: snap-70838ef6-80ce-4a0c-b8ec-4e9e49a3116d
namespace: nginx-example
resourceVersion: "5840374"
uid: c0ddd860-8695-41f5-a3cf-a3414c19dbc9
spec:
backupRepository: br-c1d8671d-a8cd-4e34-9aae-35a776875065
resourceHandle:
apiGroup: ""
kind: PersistentVolumeClaim
name: nginx-logs
status:
completionTimestamp: "2022-11-02T15:58:58Z"
metadata: Cs4LCgpuZ2lueC1sb2dzEgAaDW5naW54LWV4YW1wbGUiACokNmI0OWY0OWUtNjJkZS00NTgwLTk1ZGQtNTljYWNkZjk2YzI0Mgc1ODM5NTY1OABCCAiBpoqbBhAAWgwKA2FwcBIFbmdpbnhixQIKMGt1YmVjdGwua3ViZXJuZXRlcy5pby9sYXN0LWFwcGxpZWQtY29uZmlndXJhdGlvbhKQAnsiYXBpVmVyc2lvbiI6InYxIiwia2luZCI6IlBlcnNpc3RlbnRWb2x1bWVDbGFpbSIsIm1ldGFkYXRhIjp7ImFubm90YXRpb25zIjp7fSwibGFiZWxzIjp7ImFwcCI6Im5naW54In0sIm5hbWUiOiJuZ2lueC1sb2dzIiwibmFtZXNwYWNlIjoibmdpbngtZXhhbXBsZSJ9LCJzcGVjIjp7ImFjY2Vzc01vZGVzIjpbIlJlYWRXcml0ZU9uY2UiXSwicmVzb3VyY2VzIjp7InJlcXVlc3RzIjp7InN0b3JhZ2UiOiI1ME1pIn19LCJzdG9yYWdlQ2xhc3NOYW1lIjoidnNwaGVyZS1jc2kifX0KYiYKH3B2Lmt1YmVybmV0ZXMuaW8vYmluZC1jb21wbGV0ZWQSA3llc2IrCiRwdi5rdWJlcm5ldGVzLmlvL2JvdW5kLWJ5LWNvbnRyb2xsZXISA3llc2JHCi12b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXISFmNzaS52c3BoZXJlLnZtd2FyZS5jb21iQgoodm9sdW1lLmt1YmVybmV0ZXMuaW8vc3RvcmFnZS1wcm92aXNpb25lchIWY3NpLnZzcGhlcmUudm13YXJlLmNvbXIca3ViZXJuZXRlcy5pby9wdmMtcHJvdGVjdGlvbnoAigHAAgoZa3ViZWN0bC1jbGllbnQtc2lkZS1hcHBseRIGVXBkYXRlGgJ2MSIICIGmipsGEAAyCEZpZWxkc1YxOoACCv0BeyJmOm1ldGFkYXRhIjp7ImY6YW5ub3RhdGlvbnMiOnsiLiI6e30sImY6a3ViZWN0bC5rdWJlcm5ldGVzLmlvL2xhc3QtYXBwbGllZC1jb25maWd1cmF0aW9uIjp7fX0sImY6bGFiZWxzIjp7Ii4iOnt9LCJmOmFwcCI6e319fSwiZjpzcGVjIjp7ImY6YWNjZXNzTW9kZXMiOnt9LCJmOnJlc291cmNlcyI6eyJmOnJlcXVlc3RzIjp7Ii4iOnt9LCJmOnN0b3JhZ2UiOnt9fX0sImY6c3RvcmFnZUNsYXNzTmFtZSI6e30sImY6dm9sdW1lTW9kZSI6e319fUIAigG4AgoXa3ViZS1jb250cm9sbGVyLW1hbmFnZXISBlVwZGF0ZRoCdjEiCAiCpoqbBhAAMghGaWVsZHNWMTr6AQr3AXsiZjptZXRhZGF0YSI6eyJmOmFubm90YXRpb25zIjp7ImY6cHYua3ViZXJuZXRlcy5pby9iaW5kLWNvbXBsZXRlZCI6e30sImY6cHYua3ViZXJuZXRlcy5pby9ib3VuZC1ieS1jb250cm9sbGVyIjp7fSwiZjp2b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9LCJmOnZvbHVtZS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9fX0sImY6c3BlYyI6eyJmOnZvbHVtZU5hbWUiOnt9fX1CAIoBmAEKF2t1YmUtY29udHJvbGxlci1tYW5hZ2VyEgZVcGRhdGUaAnYxIggIgqaKmwYQADIIRmllbGRzVjE6VQpTeyJmOnN0YXR1cyI6eyJmOmFjY2Vzc01vZGVzIjp7fSwiZjpjYXBhY2l0eSI6eyIuIjp7fSwiZjpzdG9yYWdlIjp7fX0sImY6cGhhc2UiOnt9fX1CBnN0YXR1cxJnCg1SZWFkV3JpdGVPbmNlEhMSEQoHc3RvcmFnZRIGCgQ1ME1pGihwdmMtNmI0OWY0OWUtNjJkZS00NTgwLTk1ZGQtNTljYWNkZjk2YzI0Kgt2c3BoZXJlLWNzaTIKRmlsZXN5c3RlbRopCgVCb3VuZBINUmVhZFdyaXRlT25jZRoRCgdzdG9yYWdlEgYKBDUwTWk=
phase: Uploaded
progress: {}
snapshotID: pvc:nginx-example/nginx-logs:aXZkOjM2MWNmZjMzLTlmZmEtNDM4NS05NGI2LTgzNjAwMzQ4OWYzYjo4MmYzNGNiMy02YTkwLTQxODAtOWI3Yy1hMWI3OTMzZjg4MWE
svcSnapshotName: ""
The snapshot CR phase is "uploaded" as above.
kubectl get upload -A
NAMESPACE NAME AGE
velero upload-82f34cb3-6a90-4180-9b7c-a1b7933f881a 4m24s
$ kubectl get upload upload-82f34cb3-6a90-4180-9b7c-a1b7933f881a -n velero -o yaml
apiVersion: datamover.cnsdp.vmware.com/v1alpha1
kind: Upload
metadata:
creationTimestamp: "2022-11-02T15:58:49Z"
generation: 3
labels:
velero.io/exclude-from-backup: "true"
name: upload-82f34cb3-6a90-4180-9b7c-a1b7933f881a
namespace: velero
resourceVersion: "5840373"
uid: 77e7288c-4568-422b-ab4c-f11651165e31
spec:
backupRepository: br-c1d8671d-a8cd-4e34-9aae-35a776875065
backupTimestamp: "2022-11-02T15:58:49Z"
snapshotID: ivd:361cff33-9ffa-4385-94b6-836003489f3b:82f34cb3-6a90-4180-9b7c-a1b7933f881a
snapshotReference: nginx-example/snap-70838ef6-80ce-4a0c-b8ec-4e9e49a3116d
status:
completionTimestamp: "2022-11-02T15:58:58Z"
message: Upload completed
nextRetryTimestamp: "2022-11-02T15:58:49Z"
phase: Completed
processingNode: np-wc2source-6868bf59c4-nh5sd
progress: {}
startTimestamp: "2022-11-02T15:58:49Z"
The upload CR phase is "completed" as above.
continue the backup "whole-wc" above, I forcely delete the pvc and pv, and try to restore the pvc and pv from the backup:
$ velero restore create --from-backup whole-wc --include-resources pv,pvc --existing-resource-policy update
Restore request "whole-wc-20221102161036" submitted successfully.
Run `velero restore describe whole-wc-20221102161036` or `velero restore logs whole-wc-20221102161036` for more details.
$ velero restore describe whole-wc-20221102161036
Name: whole-wc-20221102161036
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: InProgress
Estimated total items to be restored: 2
Items restored so far: 1
Started: 2022-11-02 16:10:36 +0000 UTC
Completed: <n/a>
Backup: whole-wc
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: pv, pvc
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Existing Resource Policy: update
Preserve Service NodePorts: auto
$ kubectl get clonefromsnapshots -A
NAMESPACE NAME AGE
nginx-example 7b3cfd2e-f529-4ad0-891a-c3b92d2cc762 70s
$ kubectl get clonefromsnapshot 7b3cfd2e-f529-4ad0-891a-c3b92d2cc762 -n nginx-example -o yaml
apiVersion: backupdriver.cnsdp.vmware.com/v1alpha1
kind: CloneFromSnapshot
metadata:
creationTimestamp: "2022-11-02T16:10:36Z"
generation: 1
labels:
velero.io/exclude-from-backup: "true"
name: 7b3cfd2e-f529-4ad0-891a-c3b92d2cc762
namespace: nginx-example
resourceVersion: "5842919"
uid: 0eed650a-66f6-4a6a-918f-cef167970f10
spec:
apiGroup: ""
backupRepository: br-c1d8671d-a8cd-4e34-9aae-35a776875065
cloneCancel: false
kind: PersistentVolumeClaim
metadata: Cs4LCgpuZ2lueC1sb2dzEgAaDW5naW54LWV4YW1wbGUiACokNmI0OWY0OWUtNjJkZS00NTgwLTk1ZGQtNTljYWNkZjk2YzI0Mgc1ODM5NTY1OABCCAiBpoqbBhAAWgwKA2FwcBIFbmdpbnhixQIKMGt1YmVjdGwua3ViZXJuZXRlcy5pby9sYXN0LWFwcGxpZWQtY29uZmlndXJhdGlvbhKQAnsiYXBpVmVyc2lvbiI6InYxIiwia2luZCI6IlBlcnNpc3RlbnRWb2x1bWVDbGFpbSIsIm1ldGFkYXRhIjp7ImFubm90YXRpb25zIjp7fSwibGFiZWxzIjp7ImFwcCI6Im5naW54In0sIm5hbWUiOiJuZ2lueC1sb2dzIiwibmFtZXNwYWNlIjoibmdpbngtZXhhbXBsZSJ9LCJzcGVjIjp7ImFjY2Vzc01vZGVzIjpbIlJlYWRXcml0ZU9uY2UiXSwicmVzb3VyY2VzIjp7InJlcXVlc3RzIjp7InN0b3JhZ2UiOiI1ME1pIn19LCJzdG9yYWdlQ2xhc3NOYW1lIjoidnNwaGVyZS1jc2kifX0KYiYKH3B2Lmt1YmVybmV0ZXMuaW8vYmluZC1jb21wbGV0ZWQSA3llc2IrCiRwdi5rdWJlcm5ldGVzLmlvL2JvdW5kLWJ5LWNvbnRyb2xsZXISA3llc2JHCi12b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXISFmNzaS52c3BoZXJlLnZtd2FyZS5jb21iQgoodm9sdW1lLmt1YmVybmV0ZXMuaW8vc3RvcmFnZS1wcm92aXNpb25lchIWY3NpLnZzcGhlcmUudm13YXJlLmNvbXIca3ViZXJuZXRlcy5pby9wdmMtcHJvdGVjdGlvbnoAigHAAgoZa3ViZWN0bC1jbGllbnQtc2lkZS1hcHBseRIGVXBkYXRlGgJ2MSIICIGmipsGEAAyCEZpZWxkc1YxOoACCv0BeyJmOm1ldGFkYXRhIjp7ImY6YW5ub3RhdGlvbnMiOnsiLiI6e30sImY6a3ViZWN0bC5rdWJlcm5ldGVzLmlvL2xhc3QtYXBwbGllZC1jb25maWd1cmF0aW9uIjp7fX0sImY6bGFiZWxzIjp7Ii4iOnt9LCJmOmFwcCI6e319fSwiZjpzcGVjIjp7ImY6YWNjZXNzTW9kZXMiOnt9LCJmOnJlc291cmNlcyI6eyJmOnJlcXVlc3RzIjp7Ii4iOnt9LCJmOnN0b3JhZ2UiOnt9fX0sImY6c3RvcmFnZUNsYXNzTmFtZSI6e30sImY6dm9sdW1lTW9kZSI6e319fUIAigG4AgoXa3ViZS1jb250cm9sbGVyLW1hbmFnZXISBlVwZGF0ZRoCdjEiCAiCpoqbBhAAMghGaWVsZHNWMTr6AQr3AXsiZjptZXRhZGF0YSI6eyJmOmFubm90YXRpb25zIjp7ImY6cHYua3ViZXJuZXRlcy5pby9iaW5kLWNvbXBsZXRlZCI6e30sImY6cHYua3ViZXJuZXRlcy5pby9ib3VuZC1ieS1jb250cm9sbGVyIjp7fSwiZjp2b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9LCJmOnZvbHVtZS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9fX0sImY6c3BlYyI6eyJmOnZvbHVtZU5hbWUiOnt9fX1CAIoBmAEKF2t1YmUtY29udHJvbGxlci1tYW5hZ2VyEgZVcGRhdGUaAnYxIggIgqaKmwYQADIIRmllbGRzVjE6VQpTeyJmOnN0YXR1cyI6eyJmOmFjY2Vzc01vZGVzIjp7fSwiZjpjYXBhY2l0eSI6eyIuIjp7fSwiZjpzdG9yYWdlIjp7fX0sImY6cGhhc2UiOnt9fX1CBnN0YXR1cxJnCg1SZWFkV3JpdGVPbmNlEhMSEQoHc3RvcmFnZRIGCgQ1ME1pGihwdmMtNmI0OWY0OWUtNjJkZS00NTgwLTk1ZGQtNTljYWNkZjk2YzI0Kgt2c3BoZXJlLWNzaTIKRmlsZXN5c3RlbRopCgVCb3VuZBINUmVhZFdyaXRlT25jZRoRCgdzdG9yYWdlEgYKBDUwTWk=
snapshotID: pvc:nginx-example/nginx-logs:aXZkOjM2MWNmZjMzLTlmZmEtNDM4NS05NGI2LTgzNjAwMzQ4OWYzYjo4MmYzNGNiMy02YTkwLTQxODAtOWI3Yy1hMWI3OTMzZjg4MWE
status:
phase: New
$ kubectl get download -A
NAMESPACE NAME AGE
velero download-82f34cb3-6a90-4180-9b7c-a1b7933f881a-03feca3d-04bb-4ec4-9df8-99c549324ea7 94s
$ kubectl get download download-82f34cb3-6a90-4180-9b7c-a1b7933f881a-03feca3d-04bb-4ec4-9df8-99c549324ea7 -n velero -o yaml
apiVersion: datamover.cnsdp.vmware.com/v1alpha1
kind: Download
metadata:
creationTimestamp: "2022-11-02T16:10:38Z"
generation: 5
labels:
velero.io/exclude-from-backup: "true"
name: download-82f34cb3-6a90-4180-9b7c-a1b7933f881a-03feca3d-04bb-4ec4-9df8-99c549324ea7
namespace: velero
resourceVersion: "5843058"
uid: e6b7e52a-4cb3-4799-9582-4fdcfb1feaf2
spec:
backupRepositoryName: br-c1d8671d-a8cd-4e34-9aae-35a776875065
clonefromSnapshotReference: nginx-example/7b3cfd2e-f529-4ad0-891a-c3b92d2cc762
protectedEntityID: ivd:8fcad05a-24d9-43c6-997f-5983a8ccf8f0
restoreTimestamp: "2022-11-02T16:10:38Z"
snapshotID: ivd:361cff33-9ffa-4385-94b6-836003489f3b:82f34cb3-6a90-4180-9b7c-a1b7933f881a
status:
message: 'Failed to download snapshot, ivd:361cff33-9ffa-4385-94b6-836003489f3b:82f34cb3-6a90-4180-9b7c-a1b7933f881a,
from durable object storage. Data copy failed from PE ivd:8fcad05a-24d9-43c6-997f-5983a8ccf8f0,
to PE ivd:361cff33-9ffa-4385-94b6-836003489f3b:82f34cb3-6a90-4180-9b7c-a1b7933f881a:
Open virtual disk file failed. The error code is 1. with error code: 1'
nextRetryTimestamp: "2022-11-02T16:16:07Z"
phase: Retry
processingNode: np-wc2source-6868bf59c4-nh5sd
progress: {}
retryCount: 2
startTimestamp: "2022-11-02T16:10:38Z"
velero debug --backup whole-wc: bundle-2022-11-02-16-17-46.tar.gz velero debug --restore whole-wc-20221102161036: bundle-2022-11-02-16-18-13.tar.gz
The log bundle is all attached, thanks.
This is most likely a vddk issue, please ensure you retrieve the vddk logs: https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/main/docs/troubleshooting.md#vddk
vddk
I try to follow the doc you provided, but:
$ kubectl exec -n velero -it datamgr-for-vsphere-plugin-zz5p9 -- /bin/bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "65429cce8de75f36ed9a2ae758045b2af30c7c1f042208d9a342763b8ada5cbd": OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash: no such file or directory: unknown
kubectl -n velero exec -it datamgr-for-vsphere-plugin-zz5p9 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/home/nonroot #
/home/nonroot # cd /tmp/vmware-root/
/tmp/vmware-root # ls -al
total 16
drwx------ 2 root root 4096 Nov 4 07:58 .
drwxrwxrwt 1 root root 4096 Nov 4 07:59 ..
-rw-r----- 1 root root 4330 Nov 4 07:58 vixDiskLib-1.log
/tmp/vmware-root # cat vixDiskLib-1.log
sh: cat: not found
/tmp/vmware-root # vi vixDiskLib-1.log
sh: vi: not found
/tmp/vmware-root # vim vixDiskLib-1.log
sh: vim: not found
It is not easy for me to collect the log, could you please suggest a better way or could you please reproduce the issue? Thanks
Can you show more details from velero backup so we can see what resources get backed up? We do not recommend backing up the entire cluster as we can't guarantee custom resources can be restored. We recommend backup by namespaces or by label selector.
Regarding the following restore command:
velero restore create --from-backup whole-wc --include-resources pv,pvc --existing-resource-policy update
Can you delete the resources and try to do a restore without the include "pv, pvc" and "update" policy as follows? I don't think we've qualified with those options.
velero restore create --from-backup whole-wc
At restore time, PVC will be restored as long as it was backed up and PV will be re-created. If you specify to include PV resource, Velero may try to restore it on top of the PV that is already created. That could also cause the problem.
kubectl get clonefromsnapshots -A
Finally I got the vddk logfile: vixDiskLib-1.log
This time I create backup with only "pv,pvc":
$ velero backup create backup-pv-pvc2 --include-resources pvc,pv
Backup request "backup-pv-pvc2" submitted successfully.
Run `velero backup describe backup-pv-pvc2` or `velero backup logs backup-pv-pvc2` for more details.
and then delete pvc and pvc, try restore them:
$ velero restore create --from-backup backup-pv-pvc2
Restore request "backup-pv-pvc2-20221104143207" submitted successfully.
$ kubectl get download download-022d06ac-c4c4-4266-9c37-95195b96da02-0f0057b7-b96c-42b7-94c7-8ee010a66a1f -n velero -o yaml
apiVersion: datamover.cnsdp.vmware.com/v1alpha1
kind: Download
metadata:
creationTimestamp: "2022-11-04T14:32:09Z"
generation: 5
labels:
velero.io/exclude-from-backup: "true"
name: download-022d06ac-c4c4-4266-9c37-95195b96da02-0f0057b7-b96c-42b7-94c7-8ee010a66a1f
namespace: velero
resourceVersion: "6451569"
uid: fe9c08bb-3a3a-41d5-9965-76e3f055ea18
spec:
backupRepositoryName: br-dbd695f5-7b62-4748-8331-36027c843b98
clonefromSnapshotReference: nginx-example/d2341d18-080f-4cde-82e4-d222fc9e5c7d
protectedEntityID: ivd:a418b840-7827-4535-8210-4e41aaee6e64
restoreTimestamp: "2022-11-04T14:32:09Z"
snapshotID: ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02
status:
message: 'Failed to download snapshot, ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02,
from durable object storage. Data copy failed from PE ivd:a418b840-7827-4535-8210-4e41aaee6e64,
to PE ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02:
Open virtual disk file failed. The error code is 1. with error code: 1'
nextRetryTimestamp: "2022-11-04T14:37:41Z"
phase: Retry
processingNode: np-wc2source-6868bf59c4-nh5sd
progress: {}
retryCount: 2
startTimestamp: "2022-11-04T14:32:09Z"
Can you try only include "pvc" but not "pv" when you do the backup? This is because pv will be freshly created at restore time. If Velero tries to restore it on top the newly created pv, it could cause problems.
velero backup create backup-pv-pvc2 --include-resources pvc
Can you try only include "pvc" but not "pv" when you do the backup? This is because pv will be freshly created at restore time. If Velero tries to restore it on top the newly created pv, it could cause problems.
velero backup create backup-pv-pvc2 --include-resources pvc
yes, here is the result:
$ velero restore create --from-backup backup-pv-pvc2 --include-resources pvc
Restore request "backup-pv-pvc2-20221104151007" submitted successfully.
$ kubectl get clonefromsnapshot -A
NAMESPACE NAME AGE
nginx-example f79d494c-f65d-4bdc-9bc0-fe035c2186b7 13s
$ kubectl get download -A
No resources found
$ kubectl get clonefromsnapshot f79d494c-f65d-4bdc-9bc0-fe035c2186b7 -n nginx-example -o yaml
apiVersion: backupdriver.cnsdp.vmware.com/v1alpha1
kind: CloneFromSnapshot
metadata:
creationTimestamp: "2022-11-04T15:10:07Z"
generation: 1
labels:
velero.io/exclude-from-backup: "true"
name: f79d494c-f65d-4bdc-9bc0-fe035c2186b7
namespace: nginx-example
resourceVersion: "6459921"
uid: df7e3e08-b001-4d0a-98be-b6617e0319e6
spec:
apiGroup: ""
backupRepository: br-dbd695f5-7b62-4748-8331-36027c843b98
cloneCancel: false
kind: PersistentVolumeClaim
metadata: Cs4LCgpuZ2lueC1sb2dzEgAaDW5naW54LWV4YW1wbGUiACokOWY3N2M5ZTctOTRhZi00ZjlmLWJmYjEtMjUwNmU4ZDJjMTliMgc2NDQ0NzY2OABCCAj1tpSbBhAAWgwKA2FwcBIFbmdpbnhixQIKMGt1YmVjdGwua3ViZXJuZXRlcy5pby9sYXN0LWFwcGxpZWQtY29uZmlndXJhdGlvbhKQAnsiYXBpVmVyc2lvbiI6InYxIiwia2luZCI6IlBlcnNpc3RlbnRWb2x1bWVDbGFpbSIsIm1ldGFkYXRhIjp7ImFubm90YXRpb25zIjp7fSwibGFiZWxzIjp7ImFwcCI6Im5naW54In0sIm5hbWUiOiJuZ2lueC1sb2dzIiwibmFtZXNwYWNlIjoibmdpbngtZXhhbXBsZSJ9LCJzcGVjIjp7ImFjY2Vzc01vZGVzIjpbIlJlYWRXcml0ZU9uY2UiXSwicmVzb3VyY2VzIjp7InJlcXVlc3RzIjp7InN0b3JhZ2UiOiI1ME1pIn19LCJzdG9yYWdlQ2xhc3NOYW1lIjoidnNwaGVyZS1jc2kifX0KYiYKH3B2Lmt1YmVybmV0ZXMuaW8vYmluZC1jb21wbGV0ZWQSA3llc2IrCiRwdi5rdWJlcm5ldGVzLmlvL2JvdW5kLWJ5LWNvbnRyb2xsZXISA3llc2JHCi12b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXISFmNzaS52c3BoZXJlLnZtd2FyZS5jb21iQgoodm9sdW1lLmt1YmVybmV0ZXMuaW8vc3RvcmFnZS1wcm92aXNpb25lchIWY3NpLnZzcGhlcmUudm13YXJlLmNvbXIca3ViZXJuZXRlcy5pby9wdmMtcHJvdGVjdGlvbnoAigHAAgoZa3ViZWN0bC1jbGllbnQtc2lkZS1hcHBseRIGVXBkYXRlGgJ2MSIICPW2lJsGEAAyCEZpZWxkc1YxOoACCv0BeyJmOm1ldGFkYXRhIjp7ImY6YW5ub3RhdGlvbnMiOnsiLiI6e30sImY6a3ViZWN0bC5rdWJlcm5ldGVzLmlvL2xhc3QtYXBwbGllZC1jb25maWd1cmF0aW9uIjp7fX0sImY6bGFiZWxzIjp7Ii4iOnt9LCJmOmFwcCI6e319fSwiZjpzcGVjIjp7ImY6YWNjZXNzTW9kZXMiOnt9LCJmOnJlc291cmNlcyI6eyJmOnJlcXVlc3RzIjp7Ii4iOnt9LCJmOnN0b3JhZ2UiOnt9fX0sImY6c3RvcmFnZUNsYXNzTmFtZSI6e30sImY6dm9sdW1lTW9kZSI6e319fUIAigG4AgoXa3ViZS1jb250cm9sbGVyLW1hbmFnZXISBlVwZGF0ZRoCdjEiCAj2tpSbBhAAMghGaWVsZHNWMTr6AQr3AXsiZjptZXRhZGF0YSI6eyJmOmFubm90YXRpb25zIjp7ImY6cHYua3ViZXJuZXRlcy5pby9iaW5kLWNvbXBsZXRlZCI6e30sImY6cHYua3ViZXJuZXRlcy5pby9ib3VuZC1ieS1jb250cm9sbGVyIjp7fSwiZjp2b2x1bWUuYmV0YS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9LCJmOnZvbHVtZS5rdWJlcm5ldGVzLmlvL3N0b3JhZ2UtcHJvdmlzaW9uZXIiOnt9fX0sImY6c3BlYyI6eyJmOnZvbHVtZU5hbWUiOnt9fX1CAIoBmAEKF2t1YmUtY29udHJvbGxlci1tYW5hZ2VyEgZVcGRhdGUaAnYxIggI9raUmwYQADIIRmllbGRzVjE6VQpTeyJmOnN0YXR1cyI6eyJmOmFjY2Vzc01vZGVzIjp7fSwiZjpjYXBhY2l0eSI6eyIuIjp7fSwiZjpzdG9yYWdlIjp7fX0sImY6cGhhc2UiOnt9fX1CBnN0YXR1cxJnCg1SZWFkV3JpdGVPbmNlEhMSEQoHc3RvcmFnZRIGCgQ1ME1pGihwdmMtOWY3N2M5ZTctOTRhZi00ZjlmLWJmYjEtMjUwNmU4ZDJjMTliKgt2c3BoZXJlLWNzaTIKRmlsZXN5c3RlbRopCgVCb3VuZBINUmVhZFdyaXRlT25jZRoRCgdzdG9yYWdlEgYKBDUwTWk=
snapshotID: pvc:nginx-example/nginx-logs:aXZkOjYzOTdlNDFkLTEwZWMtNGY4My1hY2UyLTM5OWYxMzRiZTA1YjowMjJkMDZhYy1jNGM0LTQyNjYtOWMzNy05NTE5NWI5NmRhMDI
status:
phase: New
Here is the new vddk log: new-vixDiskLib-1.log
velero debug --backup backup-pv-pvc2: bundle-2022-11-04-15-17-31.tar.gz
velero debug --restore backup-pv-pvc2-20221104151007: bundle-2022-11-04-15-17-54.tar.gz
I see the following in the logs:
vixDiskLib logs:
2022-11-04T14:54:07.636Z| host-13| I005: DISKLIB-DSCPTR: : "ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[email protected]:902" : Failed to open NBD extent.
2022-11-04T14:54:07.636Z| host-13| I005: DISKLIB-LINK : "ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[email protected]:902" : failed to open (NBD_ERR_GENERIC).
2022-11-04T14:54:07.636Z| host-13| I005: DISKLIB-CHAIN : "ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[email protected]:902" : failed to open (NBD_ERR_GENERIC).
2022-11-04T14:54:07.636Z| host-13| I005: DISKLIB-LIB : Failed to open 'ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[email protected]:902' with flags 0x800a NBD_ERR_GENERIC (290).
2022-11-04T14:54:07.636Z| host-13| E002: VixDiskLib: Detected DiskLib error 290 (NBD_ERR_GENERIC).
2022-11-04T14:54:07.636Z| host-13| E002: VixDiskLib: Failed to open disk ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[email protected]:902!52 29 44 87 ea f1 91 48-bd 51 f6 4a fd a7 1f e0. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 6507.
2022-11-04T14:54:07.640Z| host-13| E002: VixDiskLib: VixDiskLib_OpenEx: Cannot open disk [datastore-1035] VStorageObject:a418b840-7827-4535-8210-4e41aaee6e64. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 7171.
2022-11-04T14:54:07.640Z| host-13| E002: VixDiskLib: VixDiskLib_Open: Cannot open disk . Error 1 (Unknown error) at 7249.
DataMover logs:
-->
2022-11-04T14:54:07.728Z info -[00013] [Originator@6876 sub=vimaccess] AllowVMotion: Enable VMotion.
-->
2022-11-04T14:54:07.731Z verbose -[00013] [Originator@6876 sub=vimaccess] trying to get vStorageObjectManager as typeof N3Vim4Vslm7Vcenter21VStorageObjectManagerE
2022-11-04T14:54:07.779Z verbose -[00013] [Originator@6876 sub=vmomi.soapStub[81]] Resetting stub adapter for server <cs p:00007f32b00102d0, TCP:10.202.240.52:443> : Closed
2022-11-04T14:54:07.779Z verbose -[00013] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32c4102038, h:17, <TCP '100.96.1.253 : 42162'>, <TCP '10.202.240.52 : 443'>>)>
2022-11-04T14:54:07.779Z verbose -[00013] [Originator@6876 sub=IO] shutdown; <io_obj p:0x00007f32c4102038, h:17, <TCP '100.96.1.253 : 42162'>, <TCP '10.202.240.52 : 443'>>
2022-11-04T14:54:07.779Z verbose -[00013] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32640020f8, h:18, <TCP '100.96.1.253 : 42164'>, <TCP '10.202.240.52 : 443'>>)>
2022-11-04T14:54:07.779Z verbose -[00013] [Originator@6876 sub=IO] shutdown; <io_obj p:0x00007f32640020f8, h:18, <TCP '100.96.1.253 : 42164'>, <TCP '10.202.240.52 : 443'>>
2022-11-04T14:54:07.779Z verbose -[00026] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32c4102038, h:-1, <TCP '100.96.1.253 : 42162'>, <TCP '10.202.240.52 : 443'> FD Closed>)>
2022-11-04T14:54:07.779Z verbose -[00022] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32640020f8, h:-1, <TCP '100.96.1.253 : 42164'>, <TCP '10.202.240.52 : 443'> FD Closed>)>
2022-11-04T14:54:07.779Z verbose -[00026] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32c4102038, h:-1, <TCP '100.96.1.253 : 42162'>, <TCP '10.202.240.52 : 443'> FD Closed>)>
2022-11-04T14:54:07.780Z verbose -[00022] [Originator@6876 sub=IO] Close; <SSL(<io_obj p:0x00007f32640020f8, h:-1, <TCP '100.96.1.253 : 42164'>, <TCP '10.202.240.52 : 443'> FD Closed>)>
time="2022-11-04T14:54:07Z" level=info msg="Return from the call of ivd PE overwrite API for remote PE." Remote PEID="ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02" logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/dataMover/data_mover.go:176"
time="2022-11-04T14:54:07Z" level=error msg="Failed to overwrite from remote repository." Remote PEID="ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02" error="Data copy failed from PE ivd:a418b840-7827-4535-8210-4e41aaee6e64, to PE ivd:6397e41d-10ec-4f83-ace2-399f134be05b:022d06ac-c4c4-4266-9c37-95195b96da02: Open virtual disk file failed. The error code is 1. with error code: 1" error.file="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/ivd/ivd_protected_entity.go:117" error.function=github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/ivd.IVDProtectedEntity.getDataWriter logSource="/go/src/github.com/vmware-tanzu/velero-plugin-for-vsphere/pkg/dataMover/data_mover.go:178"
……..
-->
2022-11-04T14:54:07.960Z info -[00012] [Originator@6876 sub=vimaccess] AllowVMotion: Disable VMotion.
-->
There are frequent "Enable VMotion" and "Disable VMotion" messages in the DataMover logs. What other operation was going on that triggered this? I think that is why we saw "Open virtual disk file failed" errors.
Hi @jhuisss , We'd like to raise the issue with the vddk team, could you please collect vCenter log bundle and ESXi log bundles?
Hi Thanks for your reply, it is not easy for me to get the requested log bunble, I'll find how to collect them. At the same time, could you please try to reproduce the issue?
Hi, any updates? If you can not reproduce the issue, please let me know, thanks~~
I opened an internal bug again VDDK to understand the error code, hopefully they get back soon. In the meanwhile could you please answer Xing's query about "Enable VMotion" and "Disable VMotion" messages in the DataMover logs, this is something we haven't seen in the past.
Got a reply from the vddk team:
There is no clue for the root cause from the logging while it's usually caused by network error.
Could you
1, set verbose log level for vddk (current verbose log level is set for NFC)
vixDiskLib.transport.LogLevel=6
vixDiskLib.nfc.LogLevel = 2 # avoid log rotated quickly
2, collect the vm-support for the host where connection failed.
> 2022-11-04T15:05:01.987Z| host-17| E002: VixDiskLib: Failed to open disk ha-nfc://[vsanDatastore] 2f85df62-a08b-badd-ba23-bc97e1d50a70/_00a2/[[email protected]](mailto:[email protected]):902!52 44 8e 8b 79 de 38 df-97 34 e3 0c ba ad e8 9a. Error 1 (Unknown error) (DiskLib error 290: NBD_ERR_GENERIC) at 6507.