velero icon indicating copy to clipboard operation
velero copied to clipboard

[Azure] Add AAD Workload Identity support

Open pinlast opened this issue 2 years ago • 2 comments

Describe the problem/challenge you have We are using AAD Workload Identity for granting permissions in our azure k8s. And seems like there is no support for it in velero. Only for getting permissions from file or env. Considering Azure is going to deprecate pod identity for workload identity that would make sense.

Describe the solution you'd like Add AAD Workload Identity support, getting token from azure-identity-token secret.

Environment:

  • helm version: version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.18.2"}

  • helm chart version and app version: chart: velero-2.30.1 app: 1.9.0

  • Kubernetes version : Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T18:03:20Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"5a97ee6d15525f6e4a1c2646bf1dfd2ebd5220b5", GitTreeState:"clean", BuildDate:"2022-06-15T04:26:33Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes installer & version: aks

  • Cloud provider or hardware configuration: azure

pinlast avatar Jul 08 '22 10:07 pinlast

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 16 '22 00:09 stale[bot]

Do we have something working that I can test ? I'm planning to setup Velero for our AKS clusters and would be nice to directly deploy via "Azure Workload Identity".

hunter86bg avatar Sep 20 '22 13:09 hunter86bg

I tried to use Velero + AAD WI with the proxy mode of AAD WI (https://azure.github.io/azure-workload-identity/docs/topics/service-account-labels-and-annotations.html#annotations-1) : azure.workload.identity/inject-proxy-sidecar: true

It's working but my backups are shown as "failed" the only error I found is :

velero-669595d7f8-6clb2 velero time="2022-10-31T08:39:11Z" level=error msg="backup failed" controller=backup error="[rpc error: code = Unavailable desc = error reading from server: EOF, rpc error: code = Unavailable desc = connection error: desc = \"transport: error while dialing: dial unix /tmp/plugin1080651534: connect: connection refused\"]" key=kube-system/backup logSource="pkg/controller/backup_controller.go:301"

but the backup is done, I can see my backup in my storage account, I also tried with disk snapshot, it's working but backup is also shown as failed. I tried to restore them without any issue

flbla avatar Oct 31 '22 08:10 flbla

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can, this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : https://azure.github.io/azure-workload-identity/docs/quick-start.html

flbla avatar Oct 31 '22 09:10 flbla

Thanks, I will give it a try.

hunter86bg avatar Oct 31 '22 11:10 hunter86bg

Is there any progress on this as I don't want to have to use the proxy?

stevehipwell avatar Nov 17 '22 15:11 stevehipwell

@pinlast @hunter86bg @stevehipwell Per my understanding, seems there is nothing to do on the Velero side to support the Azure AD workload identity. For the Velero helm chart, as @flbla's comment, you can already set the related connfigurations(service account/label/annotation); for the installation from CLI, you can run velero install --dry-run first and then edit the generated yaml files.

Correct me if I'm wrong.

ywk253100 avatar Jan 09 '23 02:01 ywk253100

@ywk253100 This ticket is about getting explicit support for AAD Workload Identity directly into Velero. @flbla's solution is a workaround that uses a sidecar (ie an additional container) that offers backwards compatibility with AAD Pod Identity.

It would still be better for Velero to properly support AAD Workload Identity without the need of the sidecar.

pearj avatar Jan 09 '23 02:01 pearj

I agree with @pearj that proper support for AAD Workload Identity would be better.

I don't know Go well enough to help, but there are links and examples here:

  • https://github.com/Azure/azure-workload-identity/blob/main/examples/msal-go/main.go
  • https://azure.github.io/azure-workload-identity/docs/topics/language-specific-examples/azure-identity-sdk.html
  • https://azure.github.io/azure-workload-identity/docs/topics/language-specific-examples/msal.html

adamrushuk avatar Jan 09 '23 08:01 adamrushuk

Thanks @pearj @adamrushuk. Let's put it into the 1.11 milestone.

There is a PR that updates the Azure libraries used by the Velero Azure plugin, maybe it helps for this issue either. Let's do the verification after the PR merged

ywk253100 avatar Jan 09 '23 08:01 ywk253100

@ywk253100 It looks like in that PR updates the azure-sdk-for-go/sdk/azidentity to v1.2.0, however full support isn't until v1.3.0-beta.1 (which is the most recent release at the time of writing). It seems that v1.3.0-beta.1 brings automatic support for AAD Workload Identity Support as long as the correct environment variables are there (which they should be). That sounds probably the easiest way to implement. That same issue also had example code if you need to stay on sdk/azidentity v1.2.0 for some reason.

pearj avatar Jan 09 '23 13:01 pearj

@pearj Got it. Will test it after the PR is merged

ywk253100 avatar Jan 10 '23 01:01 ywk253100

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can, this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : azure.github.io/azure-workload-identity/docs/quick-start.html

Does this work with https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure, too? What I have to configure there?

jkroepke avatar Feb 03 '23 09:02 jkroepke

As we are close to the FC date of v1.11.0, the GA version of v1.3.0 for Azure SDK sdk/azidentity isn't released yet, I'm going to move this issue out of the scope of v1.11.0. We'll fix it once v1.3.0 of the SDK is GAed.

BTW, making changes only on the Velero Azure plugin side isn't enough, Restic/Kopia used by Velero doesn't support AAD workload identity either at this moment. Issue opened for Kopia

ywk253100 avatar Mar 15 '23 09:03 ywk253100

after some investigations, it was the memory request/limit too low of my pod which was causing this issue.

So if you want to use AAD Workload Identity with Velero, you can, this is what I added to the Helm values :

serviceAccount:
  server:
    create: true
    name: velero-server
    labels:
      azure.workload.identity/use: "true"
    annotations: 
      azure.workload.identity/client-id: ${application_client_id}

podAnnotations:
  azure.workload.identity/inject-proxy-sidecar: "true"

before, you'll need to create the AAD Identity : https://azure.github.io/azure-workload-identity/docs/quick-start.html

Using this workaround does work for making backups and restoring them. However I found that if I delete my helmrelease my pods are stuck in terminating state:

node-agent-zv8g6          2/2     Terminating   6 (24h ago)   24h
velero-54d6979d77-9dpxx   2/2     Terminating   6 (24h ago)   24h

We decided it is not worth it if pods can become hanging. We'll wait for v1.12

paytience avatar May 10 '23 09:05 paytience

@paytience It depends on the version of AAD workload identity you use I had same issue : https://github.com/Azure/azure-workload-identity/issues/774#issue-1606536025

flbla avatar May 10 '23 09:05 flbla

@paytience It depends on the version of AAD workload identity you use I had same issue : Azure/azure-workload-identity#774 (comment)

I understand, any fixes for this issue? Does a new release of AAD workload identity fix using sidecar annotation?

paytience avatar May 10 '23 11:05 paytience

@paytience : with the latest version of AAD workload identity (1.1.0) I don't have the issue anymore.

flbla avatar Jun 20 '23 13:06 flbla

@flbla We're using webhook controller 1.0.0 through AKS Add-on to run AAD workload identity webhook unfortunately..

admincasper avatar Jun 20 '23 15:06 admincasper

The Azure Workload Identity is supported by the Velero Azure plugin, please refer to the doc https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/README.md#option-2-use-azure-ad-workload-identity.

But it is not supported by Kopia yet, so taking file system backup with the Azure Workload Identity is not working, please note that.

ywk253100 avatar Jul 11 '23 08:07 ywk253100

The Azure Workload Identity is supported by the Velero Azure plugin, please refer to the doc https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/README.md#option-2-use-azure-ad-workload-identity.

But it is not supported by Kopia yet, so taking file system backup with the Azure Workload Identity is not working, please note that.

Actually, it is not yet available since they have not made a release yet. In v.1.7.1 workload identity is still not supported unfortunately..

admincasper avatar Aug 03 '23 09:08 admincasper

Kopia repository part is fixed by https://github.com/vmware-tanzu/velero/pull/6686

ywk253100 avatar Sep 19 '23 06:09 ywk253100

For anyone else stumbling upon this that wants to use the helm chart, the following are the important helm values:

podLabels: {
    azure.workload.identity/use: "true"
}
labels: {
    azure.workload.identity/use: "true"
}
serviceAccount:
  server:
    create: true
    name: velero-server
    annotations:
      azure.workload.identity/client-id: ${velero_mgid_client_id}
      azure.workload.identity/tenant-id: ${tenant_id}
configuration:
  backupStorageLocation:
  - provider: azure
    bucket: velero
    config:
      resourceGroup: ${backup_resource_group}
      storageAccount: ${backup_storage_account}
      useAAD: "true"

ccadruvi avatar Dec 13 '23 07:12 ccadruvi