velero-plugin-for-vsphere icon indicating copy to clipboard operation
velero-plugin-for-vsphere copied to clipboard

PV's with Storageclass storageclass.vsphere-thin aren't restored/backed up

Open MikeK184 opened this issue 3 years ago • 4 comments

What steps did you take and what happened: Installed Velero with the following steps:

velero install \
--image arti-dockerregistry.e-bk.m086/docker-build/velero/velero:v1.6.2 \
--provider aws \
--plugins arti-dockerregistry.e-bk.m086/docker-build/velero/velero-plugin-for-aws:v1.2.1 \
--bucket velero \
--secret-file ./credentials-minio-test \
--use-volume-snapshots=false \
--use-restic \
--backup-location-config \
region=minio,s3ForcePathStyle="true",s3Url=http://10.2.216.24:9000,publicUrl=http://10.2.216.24:9000

kubectl get ds -n velero -o yaml | sed "s/path: \/var\/lib\/kubelet\/pods/path: \/var\/vcap\/data\/kubelet\/pods/g" | kubectl replace -f -

kubectl create secret generic vsphere-config-secret --from-file=velero-vsphere-test.conf --namespace=kube-system

velero plugin add arti-dockerregistry.e-bk.m086/docker-build/vsphereveleroplugin/velero-plugin-for-vsphere:v1.1.1

And made a backup with: velero backup create e-xdm-application.manual --include-namespaces e-xdm-application image

What did you expect to happen: We've already done a couple of restores with Velero and everything worked fine, but recently we migrated a whole cluster and encountered the issue that Velero didn't restore/(backup?) PV's with the Storageclass 'storageclass.vsphere-thin' even though there have been no errors during the backup and during the restore (all Items were successfully restored according to Velero).

The output of the following commands will help us better understand what's going on: To avoid possible confusion, our PV's are called pvc-uid

  • kubectl logs deployment/velero -n velero: https://gist.github.com/MikeK184/a3e78a0bc69d81069a135761e1402fe7
  • Backup-Logs via: https://gist.github.com/MikeK184/339600cff9e8698c57344002fd6db8f5

I've encountered the following error: Error: Could not retrieve components: Could not find subcomponent PEID for pvc What does PEID stand for in this case?

Anything else you would like to add: During the mentioned cluster-migration restore we've not backed up each application on its own (~150). Instead made backups of argocd, sealed-secrets (config+master key), storageclasses and finally all PV's (--include-resources persistentvolumes). None of those backups/restores seemed to output an error during a describe. Unfortunately I don't have the exact logs anymore. But I think the issue is visible in the logs above.

Environment:

  • Velero version (use velero version): Client: Version: v1.6.3 Git commit: 5fe3a50bfddc2becb4c0bd5e2d3d4053a23e95d2 Server: Version: v1.6.2
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6+vmware.1", GitCommit:"088f01db2ffab397a290be443902918b59ee032c", GitTreeState:"clean", BuildDate:"2021-04-17T01:01:00Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.22) and server (1.20) exceeds the supported minor version skew of +/-1
  • Kubernetes installer & version: Vmware TKGI v1.11.2
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.7 LTS"

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • :+1: for "I would like to see this bug fixed as soon as possible"
  • :-1: for "There are more important bugs to focus on right now"

MikeK184 avatar Nov 10 '21 08:11 MikeK184

Hi @MikeK184 - The errors that are you seeing are coming from the the vSphere plugin so I will transfer the issue to that repo.

@xing-yang @lintongj If this a Velero issue, please let us know and we'll transfer it back. Thanks!

zubron avatar Nov 10 '21 15:11 zubron

@MikeK184 would you please share the YAMLs of storage class, PV/PVC objects, as well as the backup-driver log by following the troubleshooting page?

Also, looks like you are using velero with restic. Were you doing backup using velero with velero-plugin-for-vsphere deliberately as well?

lintongj avatar Nov 10 '21 22:11 lintongj

Hi there @lintongj ;

Here are all our Storageclasses in one of our Clusters. https://gist.github.com/MikeK184/c4e5f024a9c2b6b9611356d39f418465

And here would be 4 pv followed by their pvc's: https://gist.github.com/MikeK184/e398f2421d6d3cd561d512758a9a66e5

And additional info regarding the backup-driver:

Environment

Can be found in my initial post

Logs

Velero Deployment Log: https://gist.github.com/MikeK184/7ff67e8113d8f8f93e7dca47e05631d6 Velero Backup describe: See picture in initial post Velero Backup Driver Log: https://gist.github.com/MikeK184/8084a163c41384a82d61960e5ffda8c9 Velero Daemonset Datamgr Log: https://gist.github.com/MikeK184/cd835cca29d39ef69ad6245d63f2992e Velero related CRD's:

customresourcedefinition.apiextensions.k8s.io/backupstoragelocations.velero.io
customresourcedefinition.apiextensions.k8s.io/deletebackuprequests.velero.io
customresourcedefinition.apiextensions.k8s.io/downloadrequests.velero.io
customresourcedefinition.apiextensions.k8s.io/podvolumebackups.velero.io
customresourcedefinition.apiextensions.k8s.io/podvolumerestores.velero.io
customresourcedefinition.apiextensions.k8s.io/resticrepositories.velero.io
customresourcedefinition.apiextensions.k8s.io/restores.velero.io
customresourcedefinition.apiextensions.k8s.io/schedules.velero.io
customresourcedefinition.apiextensions.k8s.io/serverstatusrequests.velero.io
customresourcedefinition.apiextensions.k8s.io/volumesnapshotlocations.velero.io

All resources(pods) in velero namespace:

$ kubectl get all -n velero

NAME                                   READY   STATUS    RESTARTS   AGE
pod/backup-driver-5bd759f4b-xc82h      1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-4c9gs   1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-9qjg4   1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-mfbv8   1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-t25h8   1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-w7hv2   1/1     Running   0          26h
pod/datamgr-for-vsphere-plugin-wlch9   1/1     Running   0          26h
pod/restic-8j9fj                       1/1     Running   0          26h
pod/restic-8nvvf                       1/1     Running   0          26h
pod/restic-b4j4b                       1/1     Running   0          26h
pod/restic-ht4fg                       1/1     Running   0          26h
pod/restic-kf8m8                       1/1     Running   0          26h
pod/restic-plz66                       1/1     Running   0          26h
pod/velero-fbf58d4b9-td4hv             1/1     Running   0          19h

NAME                                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/datamgr-for-vsphere-plugin   6         6         6       6            6           <none>          26h
daemonset.apps/restic                       6         6         6       6            6           <none>          26h

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backup-driver   1/1     1            1           26h
deployment.apps/velero          1/1     1            1           26h

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/backup-driver-5bd759f4b   1         1         1       26h
replicaset.apps/velero-5d747f8477         0         0         0       26h
replicaset.apps/velero-6f9bd594b7         0         0         0       26h
replicaset.apps/velero-fbf58d4b9          1         1         1       19h

Regarding your Question

Also, looks like you are using velero with restic. Were you doing backup using velero with velero-plugin-for-vsphere deliberately as well?

Yes/No, initially I've done the backup simply by executing velero backup create e-xdm-application.manual --include-namespaces e-xdm-application but I've also tried using velero backup create e-xdm-application-manual1 --default-volumes-to-restic --include-namespaces e-xdm-application that took quite a while to backup but it also partially failed, the output of the describe:

$ velero backup describe e-xdm-application-manual1 --details
Name:         e-xdm-application-manual1
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.20.6+vmware.1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=20

Phase:  PartiallyFailed (run `velero backup logs e-xdm-application-manual1` for more information)

Errors:    9
Warnings:  0

Namespaces:
  Included:  e-xdm-application
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2021-11-11 10:56:29 +0100 CET
Completed:  2021-11-11 10:57:52 +0100 CET

Expiration:  2021-12-11 10:56:29 +0100 CET

Total items to be backed up:  122
Items backed up:              122

Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - sealedsecrets.bitnami.com
  apps/v1/Deployment:
    - e-xdm-application/core-server
    - e-xdm-application/dataflow-server
    - e-xdm-application/neo4j
    - e-xdm-application/postgres
    - e-xdm-application/web-ui
  apps/v1/ReplicaSet:
    - e-xdm-application/core-server-5c9bb78b4
    - e-xdm-application/core-server-fdc4c5f79
    - e-xdm-application/dataflow-server-6c78bb6ff6
    - e-xdm-application/dataflow-server-784474b9cd
    - e-xdm-application/neo4j-58bc665465
    - e-xdm-application/neo4j-6d9b6454cd
    - e-xdm-application/postgres-54bb67d65f
    - e-xdm-application/postgres-698d9dc6c9
    - e-xdm-application/postgres-849b7f87dc
    - e-xdm-application/web-ui-57bfb89879
    - e-xdm-application/web-ui-6976bdd789
    - e-xdm-application/web-ui-6f44898db7
    - e-xdm-application/web-ui-799bd6bc96
  batch/v1/Job:
    - e-xdm-application/postgres-backup-1636617600
  batch/v1beta1/CronJob:
    - e-xdm-application/postgres-backup
  bitnami.com/v1alpha1/SealedSecret:
    - e-xdm-application/xdm-application-secret-database
    - e-xdm-application/xdm-application-secret-key-store
    - e-xdm-application/xdm-application-secret-ldap
    - e-xdm-application/xdm-application-secret-license
    - e-xdm-application/xdm-application-secret-neo4j
  discovery.k8s.io/v1beta1/EndpointSlice:
    - e-xdm-application/core-server-ndmhv
    - e-xdm-application/dataflow-server-mhtw7
    - e-xdm-application/neo4j-nrwjf
    - e-xdm-application/postgres-tgn9w
    - e-xdm-application/xdm-application-web-ui-4hdqd
  extensions/v1beta1/Ingress:
    - e-xdm-application/xdm-application-web-ui
    - e-xdm-application/xdm-application-web-ui-ssl
  networking.k8s.io/v1/Ingress:
    - e-xdm-application/xdm-application-web-ui
    - e-xdm-application/xdm-application-web-ui-ssl
  networking.k8s.io/v1/NetworkPolicy:
    - e-xdm-application/allow-api-server
    - e-xdm-application/allow-dynatrace
    - e-xdm-application/default-deny-all
    - e-xdm-application/default.allow-all-egress-internally
    - e-xdm-application/default.allow-all-egress-mf-ftps
    - e-xdm-application/default.backend-ingress
    - e-xdm-application/default.db-ingress
    - e-xdm-application/default.frontend-ingress
    - e-xdm-application/default.nfs
  rbac.authorization.k8s.io/v1/RoleBinding:
    - e-xdm-application/e-xdm-application-dev-rolebinding
    - e-xdm-application/e-xdm-application-psp-restricted-strict-rolebinding
    - e-xdm-application/e-xdm-application-view-rolebinding
  v1/ConfigMap:
    - e-xdm-application/kube-root-ca.crt
    - e-xdm-application/xdm-application-configmap-pgsql
    - e-xdm-application/xdm-application-configmap-scripts
    - e-xdm-application/xdm-application-configmap-xdm
  v1/Endpoints:
    - e-xdm-application/core-server
    - e-xdm-application/dataflow-server
    - e-xdm-application/neo4j
    - e-xdm-application/postgres
    - e-xdm-application/xdm-application-web-ui
  v1/Event:
    - e-xdm-application/postgres-54bb67d65f-ttdrr.16b6738d4138bafa
    - e-xdm-application/postgres-54bb67d65f.16b6738d41960304
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b6738f1e375e29
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b673906b2e4189
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b673906f44872a
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b6739074c96331
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b67390ac6278a4
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b6739128ffc8a3
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b673913bed82ac
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b67391421b5386
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b673917f2a555c
    - e-xdm-application/postgres-849b7f87dc-fpr5t.16b67396a9403860
    - e-xdm-application/postgres-849b7f87dc.16b6738f1d752147
    - e-xdm-application/postgres.16b6738d402d3564
    - e-xdm-application/postgres.16b6738f1bd43013
    - e-xdm-application/web-ui-57bfb89879-k2qjr.16b673b7e2fdbfa4
    - e-xdm-application/web-ui-57bfb89879.16b673b7e388fb3a
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738d46fcdbf4
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738df4179694
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738ed6973311
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738f42cf81cb
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738f7be4e996
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fb867ff61
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fbc60639d
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738fc054dc5e
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b6738ff8970a85
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390eaa73f3d
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390eef3b0ff
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b67390f3915414
    - e-xdm-application/web-ui-6f44898db7-2mxnc.16b673912ed747b8
    - e-xdm-application/web-ui-6f44898db7.16b6738d46817951
    - e-xdm-application/web-ui.16b6738d4363c8cd
    - e-xdm-application/web-ui.16b673b7e2aed6be
  v1/LimitRange:
    - e-xdm-application/limitrange
  v1/Namespace:
    - e-xdm-application
  v1/PersistentVolume:
    - e-xdm-application-pv
    - pvc-07d5a5b9-4c7a-42d6-8122-536616f51baa
    - pvc-1c7b34d3-898f-4788-a9a6-352a05307cde
    - pvc-3834e7a0-b764-4c70-9332-ac3e3982cfe1
    - pvc-bbe21cc5-1284-4220-99ea-3e784711c032
    - pvc-cc548c2e-c03b-4da8-8479-2cbfea721411
  v1/PersistentVolumeClaim:
    - e-xdm-application/core-server-backup
    - e-xdm-application/core-server-data
    - e-xdm-application/core-server-tasks
    - e-xdm-application/e-xdm-application-pvc
    - e-xdm-application/postgres-backup
    - e-xdm-application/postgres-data
  v1/Pod:
    - e-xdm-application/core-server-5c9bb78b4-l8fld
    - e-xdm-application/dataflow-server-6c78bb6ff6-v46bq
    - e-xdm-application/neo4j-6d9b6454cd-5fc24
    - e-xdm-application/postgres-849b7f87dc-fpr5t
    - e-xdm-application/postgres-backup-1636617600-qm8wd
    - e-xdm-application/web-ui-6f44898db7-2mxnc
  v1/ResourceQuota:
    - e-xdm-application/quota
  v1/Secret:
    - e-xdm-application/default-token-lfh98
    - e-xdm-application/xdm-application-secret-database
    - e-xdm-application/xdm-application-secret-key-store
    - e-xdm-application/xdm-application-secret-ldap
    - e-xdm-application/xdm-application-secret-license
    - e-xdm-application/xdm-application-secret-neo4j
  v1/Service:
    - e-xdm-application/core-server
    - e-xdm-application/dataflow-server
    - e-xdm-application/neo4j
    - e-xdm-application/postgres
    - e-xdm-application/xdm-application-web-ui
  v1/ServiceAccount:
    - e-xdm-application/default

Velero-Native Snapshots: <none included>

Restic Backups:
  Completed:
    e-xdm-application/dataflow-server-6c78bb6ff6-v46bq: data, db2zos-custom, tasks, tmp
    e-xdm-application/neo4j-6d9b6454cd-5fc24: tmp, var-lib-neo4j
    e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-conf, nginx-tmp, nginx-var-lib-logs, web-certificates
  Failed:
    e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-share

Logs with restic: https://gist.github.com/MikeK184/8c216c1cc86fa3a00d7851ca1922d91a

If you need anything else please let me know!

MikeK184 avatar Nov 11 '21 10:11 MikeK184

@MikeK184 Thanks for providing all the information above.

The volumes failed in the backup are not provisioned by vSphere CSI volumes, but by in-tree vSphere volume plugin. Please check the provisioner: kubernetes.io/vsphere-volume below.

allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":false,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"labels":{"app.kubernetes.io/instance":"applications"},"name":"arz.storageclass.vsphere-thin"},"parameters":{"diskformat":"thin"},"provisioner":"kubernetes.io/vsphere-volume"}
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2021-10-06T12:20:50Z"
  labels:
    app.kubernetes.io/instance: applications
  name: arz.storageclass.vsphere-thin
  resourceVersion: "58338943"
  uid: 663da4b1-0585-41bc-bda7-7202921caaaa
parameters:
  diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate

velero-plugin-for-vsphere doesn't support to backup in-tree volumes. That's reasons why you run into the failure in velero-plugin-for-vsphere. The related logging was not improved until v1.2.0. That's why the error message is not so self explanatory.

In terms of the errors in restic backup, we will need help from velero team. Hi @zubron, please feel free to transfer this issue back.

Restic Backups:
  Completed:
    e-xdm-application/dataflow-server-6c78bb6ff6-v46bq: data, db2zos-custom, tasks, tmp
    e-xdm-application/neo4j-6d9b6454cd-5fc24: tmp, var-lib-neo4j
    e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-conf, nginx-tmp, nginx-var-lib-logs, web-certificates
  Failed:
    e-xdm-application/web-ui-6f44898db7-2mxnc: nginx-share

lintongj avatar Nov 11 '21 20:11 lintongj