velero icon indicating copy to clipboard operation
velero copied to clipboard

PVC restore in EKS does not work with IRSA

Open bit-herder opened this issue 2 years ago • 3 comments

What steps did you take and what happened: I recently converted to using IRSA with the same policy (the one specified in your docs). I then wiped the cluster, installed velero (again, with IRSA), and did a restore. Everything restored OK except PVCs. Those gave a 403 unauthorized error. This was odd because obviously the S3 stuff at least was working, which meant IRSA was set up correctly.

I then reverted velero to using a regular IAM user. The restore worked fine. I think there is a bug somewhere in the EBS restore related to using IRSA. As I said the policy was the same, so i dont know what else it could be.

What did you expect to happen:

I expected the PVCs to be restored

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version):
Client:
	Version: v1.8.1
	Git commit: -
Server:
	Version: v1.8.1
  • Velero features (use velero client config get features):
features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.12-eks-a64ea69", GitCommit:"d4336843ba36120e9ed1491fddff5f2fec33eb77", GitTreeState:"clean", BuildDate:"2022-05-12T18:29:27Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1
  • Kubernetes installer & version: terraform-aws-eks (18.21.0)
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): amazon-linux-2 bundle-2022-06-01-16-41-02.tar.gz

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • :+1: for "I would like to see this bug fixed as soon as possible"
  • :-1: for "There are more important bugs to focus on right now"

bit-herder avatar Jun 01 '22 21:06 bit-herder

Am I alone in this or have other people been having this issue?

bit-herder avatar Jul 08 '22 14:07 bit-herder

@reasonerjt any news on this?

bit-herder avatar Jul 13 '22 19:07 bit-herder

This is important to me as well!!

alievrouw avatar Oct 03 '22 20:10 alievrouw

@bit-herder I tried on my lab (velero v1.10 aws-plugin v1.6) and it seems the IRSA does work.

I also checked the restore log in your log bundle and only found a bunch of errors calling some webhook:

cat ./restore_restore-1.log|grep "level=error"
time="2022-06-01T19:56:08Z" level=error msg="error restoring ingress-sdl-connector: Internal error occurred: failed calling webhook \"validate.nginx.ingress.kubernetes.io\": Post \"https://nginx-ingress-ingress-nginx-controller-admission.nginx-ingress.svc:443/networking/v1/ingresses?timeout=10s\": context deadline exceeded" logSource="pkg/restore/restore.go:1287" restore=velero/restore-1
time="2022-06-01T19:56:18Z" level=error msg="error restoring ingress-dev-proxy: Internal error occurred: failed calling webhook \"validate.nginx.ingress.kubernetes.io\": Post \"https://nginx-ingress-ingress-nginx-controller-admission.nginx-ingress.svc:443/networking/v1/ingresses?timeout=10s\": context deadline exceeded" logSource="pkg/restore/restore.go:1287" restore=velero/restore-1
......

@alievrouw Could you clarify if you work with @bit-herder or you see some similar error when using IRSA?

reasonerjt avatar Jan 13 '23 13:01 reasonerjt

Additionally, it seems during installation, there's not option for the user to set the service account for velero pod.
@sseago Was it discussed and determined not to add it? If no objection I can write a PR to add that option.

reasonerjt avatar Jan 16 '23 05:01 reasonerjt

@reasonerjt I don't think I've heard this particular issue coming up. Making it configurable at install makes sense, though, as long as the default behavior (with no user setting) is equivalent to current behavior.

sseago avatar Jan 16 '23 20:01 sseago

The PR #5802 which adds an option for user to set the service account has been merged.

I'm closing this issue as non-reproducible

reasonerjt avatar Feb 01 '23 09:02 reasonerjt