velero
velero copied to clipboard
Crash with SIGSEGV while finalizing backup of a PVC with CSI on AWS EKS
What steps did you take and what happened: Velero 1.9.0 is deployed on AWS EKS 1.22 via an official Helm chart v2.31.0. Plugins: AWS v1.5.0, CSI v0.3.0.
Upon backing up, right after CSI snapshots are created (both VolumeSnapshot, VolumeSnapshotContent in proper statuses and EBS snapshot desplays ready in AWS console) and backup is about to wrap up, Velero crashes with SIGSEGV. Backup stays in a Failed
status.
Retried multiple times and it always ends this way.
What did you expect to happen: Backup succeeds and is restorable.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please usevelero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
Can not provide this at the moment.
But here are the logs printed prior to a crash:
2022/08/11 16:23:56 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
2022/08/11 16:24:01 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
2022/08/11 16:24:06 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
2022/08/11 16:24:11 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
time="2022-08-11T16:24:12Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:130"
time="2022-08-11T16:24:12Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:115"
time="2022-08-11T16:24:12Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:130"
time="2022-08-11T16:24:12Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:115"
2022/08/11 16:24:16 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
2022/08/11 16:24:21 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
I0811 16:24:23.683210 1 request.go:665] Waited for 1.046988495s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
2022/08/11 16:24:26 info Waiting for CSI driver to reconcile volumesnapshot ohlc/velero-questdb-questdb-0-kkltb. Retrying in 5s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19bfdcd]
goroutine 5971 [running]:
github.com/vmware-tanzu/velero/pkg/controller.(*backupController).deleteVolumeSnapshot.func1(0xc00045f040)
/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_controller.go:931 +0xad
created by github.com/vmware-tanzu/velero/pkg/controller.(*backupController).deleteVolumeSnapshot
/go/src/github.com/vmware-tanzu/velero/pkg/controller/backup_controller.go:927 +0xf7
A backup in question (one of) in yaml format:
apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
helm.sh/hook: post-install,post-upgrade,post-rollback
helm.sh/hook-delete-policy: before-hook-creation
velero.io/source-cluster-k8s-gitversion: v1.22.10-eks-84b4fe6
velero.io/source-cluster-k8s-major-version: "1"
velero.io/source-cluster-k8s-minor-version: 22+
creationTimestamp: "2022-08-11T23:00:39Z"
generation: 5
labels:
app.kubernetes.io/instance: velero
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: velero
helm.sh/chart: velero-2.31.0
velero.io/schedule-name: velero-questdb-pvc
velero.io/storage-location: default
name: velero-questdb-pvc-20220811230039
namespace: velero
resourceVersion: "29774925"
uid: 6358f885-1184-45a6-922b-9b87b33054c1
spec:
defaultVolumesToRestic: false
hooks: {}
includeClusterResources: true
includedNamespaces:
- ohlc
includedResources:
- pvc
- pv
labelSelector:
matchLabels:
app.kubernetes.io/instance: questdb
app.kubernetes.io/name: questdb
metadata: {}
snapshotVolumes: true
storageLocation: default
ttl: 168h0m0s
volumeSnapshotLocations:
- default
status:
completionTimestamp: "2022-08-11T23:00:49Z"
expiration: "2022-08-18T23:00:39Z"
failureReason: get a backup with status "InProgress" during the server starting,
mark it as "Failed"
formatVersion: 1.1.0
phase: Failed
progress:
itemsBackedUp: 2
totalItems: 2
startTimestamp: "2022-08-11T23:00:39Z"
version: 1
A describe of a PersistentVolume created by a backup (one of):
Name: velero-questdb-questdb-0-x84zb
Namespace: ohlc
Labels: velero.io/backup-name=velero-questdb-pvc-20220811230039
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshot
Metadata:
Creation Timestamp: 2022-08-11T23:00:39Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
Generate Name: velero-questdb-questdb-0-
Generation: 1
Managed Fields:
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection":
v:"snapshot.storage.kubernetes.io/volumesnapshot-bound-protection":
Manager: Go-http-client
Operation: Update
Time: 2022-08-11T23:00:39Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:labels:
.:
f:velero.io/backup-name:
f:spec:
.:
f:source:
.:
f:persistentVolumeClaimName:
f:volumeSnapshotClassName:
Manager: velero-plugin-for-csi
Operation: Update
Time: 2022-08-11T23:00:39Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:boundVolumeSnapshotContentName:
f:creationTime:
f:readyToUse:
f:restoreSize:
Manager: Go-http-client
Operation: Update
Subresource: status
Time: 2022-08-11T23:00:40Z
Resource Version: 29774856
UID: 56d87f8f-5a15-4c36-9930-35359c2c23c1
Spec:
Source:
Persistent Volume Claim Name: questdb-questdb-0
Volume Snapshot Class Name: questdb-vsc
Status:
Bound Volume Snapshot Content Name: snapcontent-56d87f8f-5a15-4c36-9930-35359c2c23c1
Creation Time: 2022-08-11T23:00:40Z
Ready To Use: true
Restore Size: 50Gi
Events: <none>
A describe of a PersistentVolumeContent created by a backup (one of):
Name: snapcontent-56d87f8f-5a15-4c36-9930-35359c2c23c1
Namespace:
Labels: <none>
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshotContent
Metadata:
Creation Timestamp: 2022-08-11T23:00:39Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
Generation: 1
Managed Fields:
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection":
f:spec:
.:
f:deletionPolicy:
f:driver:
f:source:
.:
f:volumeHandle:
f:volumeSnapshotClassName:
f:volumeSnapshotRef:
.:
f:apiVersion:
f:kind:
f:name:
f:namespace:
f:resourceVersion:
f:uid:
Manager: Go-http-client
Operation: Update
Time: 2022-08-11T23:00:40Z
API Version: snapshot.storage.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:creationTime:
f:readyToUse:
f:restoreSize:
f:snapshotHandle:
Manager: Go-http-client
Operation: Update
Subresource: status
Time: 2022-08-11T23:00:40Z
Resource Version: 29774845
UID: dd15120a-fa73-4a9f-b3d7-28102e169489
Spec:
Deletion Policy: Delete
Driver: ebs.csi.aws.com
Source:
Volume Handle: vol-069935c75bcc9a2db
Volume Snapshot Class Name: questdb-vsc
Volume Snapshot Ref:
API Version: snapshot.storage.k8s.io/v1
Kind: VolumeSnapshot
Name: velero-questdb-questdb-0-x84zb
Namespace: ohlc
Resource Version: 29774811
UID: 56d87f8f-5a15-4c36-9930-35359c2c23c1
Status:
Creation Time: 1660258840065000000
Ready To Use: true
Restore Size: 53687091200
Snapshot Handle: snap-08a0e7632dac36f3f
Events: <none>
Chart values overrides:
configuration:
features: EnableCSI
provider: aws
backupStorageLocation:
name: default
provider: aws
bucket: ***-velero-backup-storage
config:
region: eu-central-1
volumeSnapshotLocation:
name: default
provider: aws
config:
region: eu-central-1
credentials:
useSecret: false
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.5.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
- name: velero-plugin-for-csi
image: velero/velero-plugin-for-csi:v0.3.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
schedules:
questdb-pvc:
disabled: false
schedule: "0 23 * * 1,2,3,4,5"
csiSnapshotTimeout: 60m
template:
ttl: "168h"
includedNamespaces:
- ohlc
includedResources:
- pvc
- pv
labelSelector:
matchLabels:
app.kubernetes.io/name: questdb
app.kubernetes.io/instance: questdb
includeClusterResources: true
snapshotVolumes: true
storageLocation: default
volumeSnapshotLocations:
- default
serviceAccount:
server:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::***:role/***-velero
Anything else you would like to add:
Environment:
- Velero version: 1.9.0
- velero-plugin-for-aws version: 1.5.0
- velero-plugin-for-csi version: 0.3.0
- Velero features: EnableCSI
- Helm chart version: 2.31.0
- Kubernetes version: v1.22.10-eks-84b4fe6
- Cloud provider or hardware configuration: AWS EKS
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
- :+1: for "I would like to see this bug fixed as soon as possible"
- :-1: for "There are more important bugs to focus on right now"