velero icon indicating copy to clipboard operation
velero copied to clipboard

BackupRepository - error to ensure repository storage empty - found existing data in storage location

Open filipe-silva-magalhaes-alb opened this issue 1 year ago • 13 comments

What steps did you take and what happened: I ran a successful backup. I uninstalled Velero, reinstalled it, ran a new backup, and got the following error: Velero creates BackupRepository with error: error to init backup repo: error to create repo with storage: error to ensure repository storage empty: found existing data in storage location

What did you expect to happen: I would expect the backup to be fine again, regardless of whether the bucket is empty or not. I don't want to have to delete the data from the bucket and lose the previous backups.

Logs: velero debug --backup teste4 bundle-2024-02-21-14-18-43.tar.gz

Anything else you would like to add:

kubectl -n velero describe backuprepository ccp-default-kopia-4zkr5 :

Name:         ccp-default-kopia-4zkr5
Namespace:    velero
Labels:       velero.io/repository-type=kopia
              velero.io/storage-location=default
              velero.io/volume-namespace=ccp
Annotations:  <none>
API Version:  velero.io/v1
Kind:         BackupRepository
Metadata:
  Generate Name:       ccp-default-kopia-
Spec:
  Backup Storage Location:  default
  Maintenance Frequency:    1h0m0s
  Repository Type:          kopia
  Restic Identifier:        s3:s3-eu-central-1.amazonaws.com/eks-backup-308173258961/restic/ccp
  Volume Namespace:         ccp
Status:
  Message:  error to create backup repo: error to create repo with storage: error to ensure repository storage empty: found existing data in storage location
  Phase:    NotReady
Events:     <none>

kubectl -n velero get backup teste4 -o yaml :

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    meta.helm.sh/release-name: velero
    meta.helm.sh/release-namespace: velero
    velero.io/resource-timeout: 10m0s
    velero.io/source-cluster-k8s-gitversion: v1.27.9-eks-5e0fdde
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: 27+
  creationTimestamp: "2024-02-21T13:35:50Z"
  generation: 73
  labels:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: velero
    helm.sh/chart: velero-5.3.0
    velero.io/schedule-name: velero-eks-backup
    velero.io/storage-location: default
  name: teste4
  namespace: velero
  resourceVersion: "74760882"
  uid: fc0db8df-e50f-4a4f-b3bb-8c46756e66e5
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToFsBackup: false
  hooks: {}
  itemOperationTimeout: 4h0m0s
  metadata: {}
  resourcePolicy:
    kind: configmap
    name: velero-efs-resourcepolicy
  snapshotMoveData: true
  storageLocation: default
  ttl: 240h0m0s
  volumeSnapshotLocations:
  - default

Environment:

  • Velero version (use velero version): v1.13.0
  • Velero features (use velero client config get features): features: EnableCSI
  • Kubernetes version (use kubectl version): v1.27.9-eks-5e0fdde
  • Kubernetes installer & version: v1.24.1
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • :+1: for "I would like to see this bug fixed as soon as possible"
  • :-1: for "There are more important bugs to focus on right now"

In normal case, Velero shouldn't fail with existing backup repository, because the Velero first try to connect the repository, if failed, then create new repository.

Could you please check whether your repository is still integrated? 截屏2024-02-22 16 55 26 There should have metadata files in the backup repository.

blackpiglet avatar Feb 22 '24 08:02 blackpiglet

The backup repository of the namespace "ccp" was not created on the bucket, but the folder still exists in the bucket.

k get backuprepository -n velero

NAME                             AGE   REPOSITORY TYPE
ccp-default-kopia-b9hsl          20h   kopia
monitoring-default-kopia-qxsns   20h   kopia

k describe backuprepository -n velero ccp-default-kopia-b9hsl | grep Message Message: error to create backup repo: error to create repo with storage: error to ensure repository storage empty: found existing data in storage location

aws s3 ls --recursive s3://eks-backup-308173258961 | grep repository 2024-02-20 15:12:27 1075 kopia/monitoring/kopia.repository

aws s3 ls --recursive s3://eks-backup-308173258961 | grep ccp

2024-02-07 12:15:42       4953 kopia/ccp/_log_20240207121540_96b9_1707308140_1707308141_1_61dd9d822a48b9347151554792cdec54
2024-02-07 12:15:44       4989 kopia/ccp/_log_20240207121542_ac05_1707308142_1707308143_1_a00a29adf4b089aaa324d246602805e8
2024-02-07 12:15:46       5028 kopia/ccp/_log_20240207121544_e7d9_1707308144_1707308145_1_7fe9885b6e1223937461f2ff0cb1856a
2024-02-07 12:15:45       4298 kopia/ccp/q62cc98a40dcf13ccd32b2d19475e5ddf-s79d755f1bbf7a5df125

The Kopia repository file was gone. I think that is the reason causing the failure. Velero met similar issues for a while, but I don't think Velero makes the mess.

Please check whether there is any lifecycle-related policy created for this bucket. The repository file doesn't get any update after the repository is initialized. The file could be deleted first when there is an expired policy created. https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html

blackpiglet avatar Feb 23 '24 08:02 blackpiglet

Hello, this bucket has no lifecycle policy associated with it.

OK. Is it possible to find the trace of the object deletion in any AWS service?

blackpiglet avatar Feb 23 '24 10:02 blackpiglet

Unfortunately no. I also tried this workaround #6909 , but didn't work.

@filipe-silva-magalhaes-alb I think the only possible option here is to delete the stale backup repository data in the OSS and delete the failed BackupRepository in the cluster.

blackpiglet avatar Feb 25 '24 03:02 blackpiglet

@filipe-silva-magalhaes-alb I haven't produced this issue in my EKS environment yet. Is this issue easy to reproduce in your environment?

I ran a successful backup. I uninstalled Velero, reinstalled it, ran a new backup, and got the following error

If this issue can be produced by running these commands sequentially, it's worthwhile to find the reason.

If you can reproduce this, would you mind helping us debug it with some AWS S3 log turning on? https://repost.aws/knowledge-center/s3-audit-deleted-missing-objects The CloudTrail watching specific object's deletion should be the most convenient way.

blackpiglet avatar Feb 26 '24 08:02 blackpiglet

I upgraded to velero 1.13 and then started getting errors below. I tired first successful against empty S3 bucket then failed on second backup . I am running velero with on 'readOnlyRootFilesystem: true' using default image.

Any help appreciated

Errors: Velero: name: /mongodb-0 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system name: /mongodb-1 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system name: /mongodb-2 message: /Error backing up item error: /failed to wait BackupRepository: backup repository is not ready: error to connect to backup repo: error to connect repo with storage: error to connect to repository: unable to write config file: unable to create config directory: mkdir /home/cnb/udmrepo: read-only file system Cluster: <none>

kingnarmer avatar Mar 01 '24 15:03 kingnarmer

I was able to get backup to run successfully by adding this section to chart input with velero running on read-only rootfs.

extraVolumes:
- emptyDir: {}
  name: udmrepo
- emptyDir: {}
  name: cache

extraVolumeMounts:
- mountPath: /home/cnb/udmrepo
  name: udmrepo
- mountPath: /home/cnb/.cache
  name: cache

kingnarmer avatar Mar 05 '24 02:03 kingnarmer

@kingnarmer Sorry, I missed this thread notification.

I suppose you are using Kopia as the uploader, and running the container as user cnb?

blackpiglet avatar Mar 06 '24 06:03 blackpiglet

@blackpiglet I use kopia as uploader. I didn't make any change to whatever default user is used in container.

kingnarmer avatar Mar 08 '24 21:03 kingnarmer

@kingnarmer Thanks. This topic may be worth noting somewhere in the Velero document.

blackpiglet avatar Mar 12 '24 08:03 blackpiglet