velero-plugin-for-vsphere
velero-plugin-for-vsphere copied to clipboard
PV's with Storageclass storageclass.vsphere-thin aren't restored/backed up
What steps did you take and what happened: Installed Velero with the following steps:
velero install \
--image arti-dockerregistry.e-bk.m086/docker-build/velero/velero:v1.6.2 \
--provider aws \
--plugins arti-dockerregistry.e-bk.m086/docker-build/velero/velero-plugin-for-aws:v1.2.1 \
--bucket velero \
--secret-file ./credentials-minio-test \
--use-volume-snapshots=false \
--use-restic \
--backup-location-config \
region=minio,s3ForcePathStyle="true",s3Url=http://10.2.216.24:9000,publicUrl=http://10.2.216.24:9000
kubectl get ds -n velero -o yaml | sed "s/path: \/var\/lib\/kubelet\/pods/path: \/var\/vcap\/data\/kubelet\/pods/g" | kubectl replace -f -
kubectl create secret generic vsphere-config-secret --from-file=velero-vsphere-test.conf --namespace=kube-system
velero plugin add arti-dockerregistry.e-bk.m086/docker-build/vsphereveleroplugin/velero-plugin-for-vsphere:v1.1.1
And made a backup with:
velero backup create e-xdm-application.manual --include-namespaces e-xdm-application
What did you expect to happen: We've already done a couple of restores with Velero and everything worked fine, but recently we migrated a whole cluster and encountered the issue that Velero didn't restore/(backup?) PV's with the Storageclass 'storageclass.vsphere-thin' even though there have been no errors during the backup and during the restore (all Items were successfully restored according to Velero).
The output of the following commands will help us better understand what's going on: To avoid possible confusion, our PV's are called pvc-uid
- kubectl logs deployment/velero -n velero: https://gist.github.com/MikeK184/a3e78a0bc69d81069a135761e1402fe7
- Backup-Logs via: https://gist.github.com/MikeK184/339600cff9e8698c57344002fd6db8f5
I've encountered the following error: Error: Could not retrieve components: Could not find subcomponent PEID for pvc
What does PEID stand for in this case?
Anything else you would like to add: During the mentioned cluster-migration restore we've not backed up each application on its own (~150). Instead made backups of argocd, sealed-secrets (config+master key), storageclasses and finally all PV's (--include-resources persistentvolumes). None of those backups/restores seemed to output an error during a describe. Unfortunately I don't have the exact logs anymore. But I think the issue is visible in the logs above.
Environment:
- Velero version (use
velero version
):Client: Version: v1.6.3 Git commit: 5fe3a50bfddc2becb4c0bd5e2d3d4053a23e95d2 Server: Version: v1.6.2
- Velero features (use
velero client config get features
):features: <NOT SET>
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6+vmware.1", GitCommit:"088f01db2ffab397a290be443902918b59ee032c", GitTreeState:"clean", BuildDate:"2021-04-17T01:01:00Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.22) and server (1.20) exceeds the supported minor version skew of +/-1
- Kubernetes installer & version:
Vmware TKGI v1.11.2
- OS (e.g. from
/etc/os-release
):Ubuntu 16.04.7 LTS"
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
- :+1: for "I would like to see this bug fixed as soon as possible"
- :-1: for "There are more important bugs to focus on right now"
Hi @MikeK184 - The errors that are you seeing are coming from the the vSphere plugin so I will transfer the issue to that repo.
@xing-yang @lintongj If this a Velero issue, please let us know and we'll transfer it back. Thanks!
@MikeK184 would you please share the YAMLs of storage class, PV/PVC objects, as well as the backup-driver log by following the troubleshooting page?
Also, looks like you are using velero with restic. Were you doing backup using velero with velero-plugin-for-vsphere deliberately as well?
Hi there @lintongj ;
Here are all our Storageclasses in one of our Clusters. https://gist.github.com/MikeK184/c4e5f024a9c2b6b9611356d39f418465
And here would be 4 pv followed by their pvc's: https://gist.github.com/MikeK184/e398f2421d6d3cd561d512758a9a66e5
And additional info regarding the backup-driver:
Environment
Can be found in my initial post
Logs
Velero Deployment Log: https://gist.github.com/MikeK184/7ff67e8113d8f8f93e7dca47e05631d6 Velero Backup describe: See picture in initial post Velero Backup Driver Log: https://gist.github.com/MikeK184/8084a163c41384a82d61960e5ffda8c9 Velero Daemonset Datamgr Log: https://gist.github.com/MikeK184/cd835cca29d39ef69ad6245d63f2992e Velero related CRD's:
customresourcedefinition.apiextensions.k8s.io/backupstoragelocations.velero.io
customresourcedefinition.apiextensions.k8s.io/deletebackuprequests.velero.io
customresourcedefinition.apiextensions.k8s.io/downloadrequests.velero.io
customresourcedefinition.apiextensions.k8s.io/podvolumebackups.velero.io
customresourcedefinition.apiextensions.k8s.io/podvolumerestores.velero.io
customresourcedefinition.apiextensions.k8s.io/resticrepositories.velero.io
customresourcedefinition.apiextensions.k8s.io/restores.velero.io
customresourcedefinition.apiextensions.k8s.io/schedules.velero.io
customresourcedefinition.apiextensions.k8s.io/serverstatusrequests.velero.io
customresourcedefinition.apiextensions.k8s.io/volumesnapshotlocations.velero.io
All resources(pods) in velero namespace:
$ kubectl get all -n velero
NAME READY STATUS RESTARTS AGE
pod/backup-driver-5bd759f4b-xc82h 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-4c9gs 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-9qjg4 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-mfbv8 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-t25h8 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-w7hv2 1/1 Running 0 26h
pod/datamgr-for-vsphere-plugin-wlch9 1/1 Running 0 26h
pod/restic-8j9fj 1/1 Running 0 26h
pod/restic-8nvvf 1/1 Running 0 26h
pod/restic-b4j4b 1/1 Running 0 26h
pod/restic-ht4fg 1/1 Running 0 26h
pod/restic-kf8m8 1/1 Running 0 26h
pod/restic-plz66 1/1 Running 0 26h
pod/velero-fbf58d4b9-td4hv 1/1 Running 0 19h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/datamgr-for-vsphere-plugin 6 6 6 6 6 <none> 26h
daemonset.apps/restic 6 6 6 6 6 <none> 26h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/backup-driver 1/1 1 1 26h
deployment.apps/velero 1/1 1 1 26h
NAME DESIRED CURRENT READY AGE
replicaset.apps/backup-driver-5bd759f4b 1 1 1 26h
replicaset.apps/velero-5d747f8477 0 0 0 26h
replicaset.apps/velero-6f9bd594b7 0 0 0 26h
replicaset.apps/velero-fbf58d4b9 1 1 1 19h
Regarding your Question
Also, looks like you are using velero with restic. Were you doing backup using velero with velero-plugin-for-vsphere deliberately as well?
Yes/No, initially I've done the backup simply by executing velero backup create e-xdm-application.manual --include-namespaces e-xdm-application
but I've also tried using velero backup create e-xdm-application-manual1 --default-volumes-to-restic --include-namespaces e-xdm-application
that took quite a while to backup but it also partially failed, the output of the describe:
$ velero backup describe e-xdm-application-manual1 --details
Name: e-xdm-application-manual1
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.20.6+vmware.1
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=20
Phase: PartiallyFailed (run `velero backup logs e-xdm-application-manual1` for more information)
Errors: 9
Warnings: 0
Namespaces:
Included: e-xdm-application
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2021-11-11 10:56:29 +0100 CET
Completed: 2021-11-11 10:57:52 +0100 CET
Expiration: 2021-12-11 10:56:29 +0100 CET
Total items to be backed up: 122
Items backed up: 122
Resource List:
apiextensions.k8s.io/v1/CustomResourceDefinition:
- sealedsecrets.bitnami.com
apps/v1/Deployment:
- e-xdm-application/core-server
- e-xdm-application/dataflow-server
- e-xdm-application/neo4j
- e-xdm-application/postgres
- e-xdm-application/web-ui
apps/v1/ReplicaSet:
- e-xdm-application/core-server-5c9bb78b4
- e-xdm-application/core-server-fdc4c5f79
- e-xdm-application/dataflow-server-6c78bb6ff6
- e-xdm-application/dataflow-server-784474b9cd
- e-xdm-application/neo4j-58bc665465
- e-xdm-application/neo4j-6d9b6454cd
- e-xdm-application/postgres-54bb67d65f
- e-xdm-application/postgres-698d9dc6c9
- e-xdm-application/postgres-849b7f87dc
- e-xdm-application/web-ui-57bfb89879
- e-xdm-application/web-ui-6976bdd789
- e-xdm-application/web-ui-6f44898db7
- e-xdm-application/web-ui-799bd6bc96
batch/v1/Job:
- e-xdm-application/postgres-backup-1636617600
batch/v1beta1/CronJob:
- e-xdm-application/postgres-backup
bitnami.com/v1alpha1/SealedSecret:
- e-xdm-application/xdm-application-secret-database
- e-xdm-application/xdm-application-secret-key-store
- e-xdm-application/xdm-application-secret-ldap
- e-xdm-application/xdm-application-secret-license
- e-xdm-application/xdm-application-secret-neo4j
discovery.k8s.io/v1beta1/EndpointSlice:
- e-xdm-application/core-server-ndmhv
- e-xdm-application/dataflow-server-mhtw7
- e-xdm-application/neo4j-nrwjf
- e-xdm-application/postgres-tgn9w
- e-xdm-application/xdm-application-web-ui-4hdqd
extensions/v1beta1/Ingress:
- e-xdm-application/xdm-application-web-ui
- e-xdm-application/xdm-application-web-ui-ssl
networking.k8s.io/v1/Ingress:
- e-xdm-application/xdm-application-web-ui
- e-xdm-application/xdm-application-web-ui-ssl
networking.k8s.io/v1/NetworkPolicy:
- e-xdm-application/allow-api-server
- e-xdm-application/allow-dynatrace
- e-xdm-application/default-deny-all
- e-xdm-application/default.allow-all-egress-internally
- e-xdm-application/default.allow-all-egress-mf-ftps
- e-xdm-application/default.backend-ingress
- e-xdm-application/default.db-ingress
- e-xdm-application/default.frontend-ingress
- e-xdm-application/default.nfs
rbac.authorization.k8s.io/v1/RoleBinding:
- e-xdm-application/e-xdm-application-dev-rolebinding
- e-xdm-application/e-xdm-application-psp-restricted-strict-rolebinding
- e-xdm-application/e-xdm-application-view-rolebinding
v1/ConfigMap:
- e-xdm-application/kube-root-ca.crt
- e-xdm-application/xdm-application-configmap-pgsql
- e-xdm-application/xdm-application-configmap-scripts
- e-xdm-application/xdm-application-configmap-xdm
v1/Endpoints:
- e-xdm-application/core-server
- e-xdm-application/dataflow-server
- e-xdm-application/neo4j
- e-xdm-application/postgres
- e-xdm-application/xdm-application-web-ui
v1/Event:
- e-xdm-application/postgres-54bb67d65f-ttdrr.16b6738d4138bafa
- e-xdm-application/postgres-54bb67d65f.16b6738d41960304
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b6738f1e375e29
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b673906b2e4189
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b673906f44872a
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b6739074c96331
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b67390ac6278a4
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b6739128ffc8a3
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b673913bed82ac
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b67391421b5386
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b673917f2a555c
- e-xdm-application/postgres-849b7f87dc-fpr5t.16b67396a9403860
- e-xdm-application/postgres-849b7f87dc.16b6738f1d752147
- e-xdm-application/postgres.16b6738d402d3564
- e-xdm-application/postgres.16b6738f1bd43013
- e-xdm-application/web-ui-57bfb89879-k2qjr.16b673b7e2fdbfa4
- e-xdm-application/web-ui-57bfb89879.16b673b7e388fb3a
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738d46fcdbf4
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738df4179694
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738ed6973311
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738f42cf81cb
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738f7be4e996
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fb867ff61
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fbc60639d
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fc054dc5e
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738ff8970a85
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390eaa73f3d
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390eef3b0ff
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390f3915414
- e-xdm-application/web-ui-6f44898db7-2mxnc.16b673912ed747b8
- e-xdm-application/web-ui-6f44898db7.16b6738d46817951
- e-xdm-application/web-ui.16b6738d4363c8cd
- e-xdm-application/web-ui.16b673b7e2aed6be
v1/LimitRange:
- e-xdm-application/limitrange
v1/Namespace:
- e-xdm-application
v1/PersistentVolume:
- e-xdm-application-pv
- pvc-07d5a5b9-4c7a-42d6-8122-536616f51baa
- pvc-1c7b34d3-898f-4788-a9a6-352a05307cde
- pvc-3834e7a0-b764-4c70-9332-ac3e3982cfe1
- pvc-bbe21cc5-1284-4220-99ea-3e784711c032
- pvc-cc548c2e-c03b-4da8-8479-2cbfea721411
v1/PersistentVolumeClaim:
- e-xdm-application/core-server-backup
- e-xdm-application/core-server-data
- e-xdm-application/core-server-tasks
- e-xdm-application/e-xdm-application-pvc
- e-xdm-application/postgres-backup
- e-xdm-application/postgres-data
v1/Pod:
- e-xdm-application/core-server-5c9bb78b4-l8fld
- e-xdm-application/dataflow-server-6c78bb6ff6-v46bq
- e-xdm-application/neo4j-6d9b6454cd-5fc24
- e-xdm-application/postgres-849b7f87dc-fpr5t
- e-xdm-application/postgres-backup-1636617600-qm8wd
- e-xdm-application/web-ui-6f44898db7-2mxnc
v1/ResourceQuota:
- e-xdm-application/quota
v1/Secret:
- e-xdm-application/default-token-lfh98
- e-xdm-application/xdm-application-secret-database
- e-xdm-application/xdm-application-secret-key-store
- e-xdm-application/xdm-application-secret-ldap
- e-xdm-application/xdm-application-secret-license
- e-xdm-application/xdm-application-secret-neo4j
v1/Service:
- e-xdm-application/core-server
- e-xdm-application/dataflow-server
- e-xdm-application/neo4j
- e-xdm-application/postgres
- e-xdm-application/xdm-application-web-ui
v1/ServiceAccount:
- e-xdm-application/default
Velero-Native Snapshots: <none included>
Restic Backups:
Completed:
e-xdm-application/dataflow-server-6c78bb6ff6-v46bq: data, db2zos-custom, tasks, tmp
e-xdm-application/neo4j-6d9b6454cd-5fc24: tmp, var-lib-neo4j
e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-conf, nginx-tmp, nginx-var-lib-logs, web-certificates
Failed:
e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-share
Logs with restic: https://gist.github.com/MikeK184/8c216c1cc86fa3a00d7851ca1922d91a
If you need anything else please let me know!
@MikeK184 Thanks for providing all the information above.
The volumes failed in the backup are not provisioned by vSphere CSI volumes, but by in-tree vSphere volume plugin. Please check the provisioner: kubernetes.io/vsphere-volume
below.
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"allowVolumeExpansion":false,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"labels":{"app.kubernetes.io/instance":"applications"},"name":"arz.storageclass.vsphere-thin"},"parameters":{"diskformat":"thin"},"provisioner":"kubernetes.io/vsphere-volume"}
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2021-10-06T12:20:50Z"
labels:
app.kubernetes.io/instance: applications
name: arz.storageclass.vsphere-thin
resourceVersion: "58338943"
uid: 663da4b1-0585-41bc-bda7-7202921caaaa
parameters:
diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate
velero-plugin-for-vsphere doesn't support to backup in-tree volumes. That's reasons why you run into the failure in velero-plugin-for-vsphere. The related logging was not improved until v1.2.0. That's why the error message is not so self explanatory.
In terms of the errors in restic backup, we will need help from velero team. Hi @zubron, please feel free to transfer this issue back.
Restic Backups:
Completed:
e-xdm-application/dataflow-server-6c78bb6ff6-v46bq: data, db2zos-custom, tasks, tmp
e-xdm-application/neo4j-6d9b6454cd-5fc24: tmp, var-lib-neo4j
e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-conf, nginx-tmp, nginx-var-lib-logs, web-certificates
Failed:
e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-share