loki icon indicating copy to clipboard operation
loki copied to clipboard

Azure storage with federated workload identity

Open leinad87 opened this issue 10 months ago • 10 comments

Is your feature request related to a problem? Please describe. To configure Azure storage, there are some options like managed identities or service principal. I don't like managed identities becase they can be used by any pod in the cluster. If you think about service principal, SAS or blob storage token, they have sensitive data that should be managed carefully and secrets are not easily configured: https://github.com/grafana/loki/issues/9143

Describe the solution you'd like Azure has introduce Federated Identity Credentials. This will allow to sync a Service Account with a Azure application with required permisions without passwords.

Could be possible to implement Azure Federated Workload Identity to authenticate Loki against Azure Storage Account?

leinad87 avatar Apr 15 '24 10:04 leinad87

Unless I am misinterpreting this PR, I think this is a documentation issue, not a feature support issue. It looks like Workload Identity is already implemented. I will be trying to get it working myself in the coming days. https://github.com/grafana/loki/pull/11802

Edit: Nevermind, I see that PR was specificly for the Loki Operator, not Loki itself. I'm not even sure where the relevant code is for this. I hope it's not this file, because that is using an old azure go library that has been effectively deprecated for more than a year now, so updating it will require more effort.

rowanmoul avatar May 02 '24 02:05 rowanmoul

Good news! Workload identity is actually supported already as of PR #7540 that I somehow missed before, it's just a poorly named option. The azure storage config value that you want is called useFederatedToken in the loki helm chart (use_federated_token if setting in loki's config.yaml) I have tested today and loki is able to successfully authenticate.

rowanmoul avatar May 08 '24 21:05 rowanmoul

I had to add this to the helm values to get it to work:

        loki:
          podLabels:
            azure.workload.identity/use: 'true'

valimail-scott avatar May 10 '24 22:05 valimail-scott

yes, I assumed that was a given when using workload identity. There are a number of other things that need to be setup, like a "federated credential" azure resource, and enabling some settings on your azure kubernetes cluster to allow it to issue OIDC tokens. You'll also need a label and annotation on the service account:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: 'your-client-id-here'
  labels:
    azure.workload.identity/use: 'true'

rowanmoul avatar May 11 '24 00:05 rowanmoul

To be fair, not everything requires the podLabels. external-dns does, but external-secrets-operator works with only the serviceAccount annotation as an example.

valimail-scott avatar May 11 '24 16:05 valimail-scott

Tried this tonight, but getting a panic from ADAL(!?)

The workload identity token is added to the pod with the usual AZURE_ env variables...

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1bb19b9]

goroutine 1 [running]:
github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...)
	/src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:418
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc001706e00, {0x2b87c90?, 0x2b87c98?})
	/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:441 +0x279
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc001706e00)
	/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:393 +0xe6
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc001706e00, {0xee6b280, 0x3, 0x14}, 0x0)
	/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:377 +0x265
github.com/grafana/loki/v3/pkg/storage/chunk/client/azure.NewBlobStorage(0xc000676a10, {0xc000814040?, {0x325bfe8?, 0xc000bd6420?}}, {0xc000c75080?, 0xc001798698?, 0x4105e5?})
	/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:201 +0x111
github.com/grafana/loki/v3/pkg/ruler/base.NewLegacyRuleStore({{0xc00082c560, 0x5}, {{0x2a7e063, 0xb}, {0xc000c99c50, 0x11}, {{0x0, 0x0}}, {0x0, 0x0}, ...}, ...}, ...)
	/src/loki/pkg/ruler/base/storage.go:99 +0x325
github.com/grafana/loki/v3/pkg/loki.(*Loki).initRulerStorage(0xc001708000)
	/src/loki/pkg/loki/modules.go:1153 +0x2fb
github.com/grafana/dskit/modules.(*Manager).initModule(0xc000aa43a8, {0x7ffd9c975324, 0x3}, 0x1?, 0xc0017000c0?)
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc00066e0c0, 0x1, 0xc000cc8180?})
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc001708000, {0x0?, {0x4?, 0x3?, 0x4912940?}})
	/src/loki/pkg/loki/loki.go:453 +0x9d
main.main()
	/src/loki/cmd/loki/main.go:122 +0x113b
Helm Values

    monitoring:
      serviceMonitor:
        enable: true
    loki:
      podLabels:
        azure.workload.identity/use: 'true'
      commonConfig:
        path_prefix: /var/loki
        replication_factor: 3
      compactor_address: '{{ include "loki.compactorAddress" . }}'
      schemaConfig:
        configs:
          - from: "2024-04-01"
            store: tsdb
            index:
              prefix: loki_index_
              period: 24h
            object_store: azure
            schema: v13
      storage:
        type: azure
        bucketNames:
          chunks: loki-chunks # Needs to pre-exist
          ruler: loki-rulers # Needs to pre-exist
          admin: loki-admin # Needs to pre-exist
        azure:
          accountName: redacted
          userAssignedId: "redacted"
          endpointSuffix: blob.core.windows.net
          useFederatedToken: true
    singleBinary:
      replicas: 0
      extraArgs:
        - "-config.expand-env=true"
    deploymentMode: SingleBinary

Richard87 avatar May 14 '24 21:05 Richard87

@Richard87 Indeed, I came to this issue board to raise an issue on this when I saw this thread already open. When using the useManagedIdentity: true flag an error appears from adal:

adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=<redacted-clientId>&resource=https%3A%2F%2F<redacted-storage-account-name>.blob.core.windows.net

This is quite concerning as the adal library has been deprecated for over a year now.

CaspervdKerk avatar May 16 '24 08:05 CaspervdKerk

This issue is about workload identity, not managed identity. While workload identity makes use of managed identities under the hood, it is a different authentication method.

I'm not a Loki maintainer, but I suspect it would be better to create a new issue either about managed identity authentication, or more generally about the deprecated ADAL library, which might get more attention. When Loki is updated to the Azure.Identity sub-package of the Azure SDK for Go it can make use of the DefaultAzureCredential which will auto-detect which authentication method you are using from a number of different options, including both Managed Identity and Workload Identity.

rowanmoul avatar May 16 '24 14:05 rowanmoul

These are the relevant helm values I am using successfully for Workload Identity on Loki Helm Chart 6.5.2

loki:
  podLabels:
    "azure.workload.identity/use": "true"
  storage:
    type: "azure"
    azure:
      accountName: "redacted"
      useFederatedToken: true
    bucketNames:
      chunks: "loki-chunk-store"
      ruler: "loki-ruler-store"
serviceAccount:
  annotations:
    "azure.workload.identity/client-id": "redacted"
  labels:
    "azure.workload.identity/use": "true"

rowanmoul avatar May 16 '24 17:05 rowanmoul

Thanks, I added azure.workload.identity/client-id annotation manually to the service account, and it stopped panicking!

(And when creating the bucket names in the correct storage account, everything worked great! 🙏 )

Richard87 avatar May 17 '24 18:05 Richard87

In case somebody is having any issue to populate the ConfigMap with proper Azure storage values for the chunks: just try disabling MinIO.

teksuo avatar Sep 25 '24 14:09 teksuo