loki icon indicating copy to clipboard operation
loki copied to clipboard

Crashes read/write/backend using Azure blob with AAD workload identity.

Open monaka opened this issue 1 year ago • 5 comments

Describe the bug

I tried to use Azure blob with AAD workflow identity. I got errors on read/write/backend.

To Reproduce

  1. Deploys loki by ArgoCD App with these parameters. ( You will reproduce it without ArgoCD. )
project: default
source:
  repoURL: 'https://grafana.github.io/helm-charts'
  targetRevision: 5.8.9
  helm:
    parameters:
      - name: backend.persistence.enableStatefulSetAutoDeletePVC
        value: 'false'
      - name: loki.podLabels.azure\.workload\.identity/use
        value: 'true'
        forceString: true
      - name: loki.storage.type
        value: azure
      - name: loki.storage.azure.accountName
        value: {{snip}}
      - name: loki.storage.azure.useFederatedToken
        value: 'true'
      - name: minio.enabled
        value: 'false'
      - name: monitoring.selfMonitoring.grafanaAgent.installOperator
        value: 'false'
      - name: read.persistence.enableStatefulSetAutoDeletePVC
        value: 'false'
  chart: loki
destination:
  server: 'https://kubernetes.default.svc'
  namespace: loki
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - 
  1. Just watching.
  2. Componets are in CrashLoopBackoff.

Expected behavior

All components are booted up and moving to Running.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1782a2d]

goroutine 1 [running]:
github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...)
        /src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:411
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc0006a8800, {0x25c8db8?, 0x25c8dc0?})
        /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:414 +0x36d
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc0006a8800)
        /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:359 +0x105
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc0006a8800, {0xee6b280, 0x3, 0x14}, 0x0)
        /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:343 +0x25b
github.com/grafana/loki/pkg/storage/chunk/client/azure.NewBlobStorage(0xc00027ad20, {0xc0004bc348?, {0x29fe7e0?, 0xc0004b76e0?}}, {0x0?, 0x0?, 0x0?})
        /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:197 +0x14c
github.com/grafana/loki/pkg/storage.NewObjectClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...)
        /src/loki/pkg/storage/factory.go:515 +0x985
github.com/grafana/loki/pkg/storage.NewChunkClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...)
        /src/loki/pkg/storage/factory.go:340 +0x6b4
github.com/grafana/loki/pkg/storage.(*store).chunkClientForPeriod(0xc000314600, {{0x17e466f3400}, {0xc000a30cd0, 0xe}, {0xc000a30ca8, 0x5}, {0xc000a30cc0, 0x3}, {{0xc000a30c80, 0xb}, ...}, ...})
        /src/loki/pkg/storage/store.go:185 +0x27c
github.com/grafana/loki/pkg/storage.(*store).init(0xc000314600)
        /src/loki/pkg/storage/store.go:155 +0xf8
github.com/grafana/loki/pkg/storage.NewStore({{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, {{{0x0}, 0x4000000000000000, ...}, ...}, ...}, ...)
        /src/loki/pkg/storage/store.go:147 +0xa3b
github.com/grafana/loki/pkg/loki.(*Loki).initStore(0xc000948000)
        /src/loki/pkg/loki/modules.go:655 +0x598
github.com/grafana/dskit/modules.(*Manager).initModule(0xc0004b9080, {0x7ffeef0c550b, 0x5}, 0x1?, 0xc000635c20?)
        /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120 +0x20a
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x856d54?, {0xc0008e4670, 0x1, 0xc0008e4940?})
        /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92 +0xf8
github.com/grafana/loki/pkg/loki.(*Loki).Run(0xc000948000, {0xc0008e8980?})
        /src/loki/pkg/loki/loki.go:457 +0x56
main.main()
        /src/loki/cmd/loki/main.go:110 +0xe65

monaka avatar Jul 17 '23 04:07 monaka

Additional info:

There has a Cert-manager with AAD workload identity in the same AKS cluster. It works with no trouble. So I believe that base settings are done.

monaka avatar Jul 17 '23 04:07 monaka

I have no issues writing to an Azure storage account with AAD Workload Identity with the configuration below. Are annotations configured on the serviceAccount? I would also double check the federated credentials/role assignments is properly configured on Azure.

loki:
  serviceAccount:
    annotations:
      azure.workload.identity/tenant-id: "<tenant-id-for-azure-account>"
      azure.workload.identity/client-id:  "<client-id-for-managed-identity>"
  loki:
    storage:
      type: azure
      azure:
        accountName: <storage-account-name>
        accountKey: null
        useManagedIdentity: false
        useFederatedToken: true

mikbonda avatar Aug 08 '23 23:08 mikbonda

I have the same issue with one of my subscriptions. As soon as I enable the use Federated Token this issue appears. So I have two different subscriptions a private and a company. My private setup is way-less complex than the company subscription. If I configure this in my private subscription it's working correctly.

For some reason the company subscription I got a runtime error. I used workload identity with other stuff as well without any issue so even if I believe my configuration is correct I have to admit that the issue occurrence is related to the config because if I enable the use Federated Token feature without configuring properly this issue also appears in my private subscription where it worked if a configured properly. However throwing an invalid memory address or nil pointer dereference, segmentation violation runtime error. clearly an application-side issue and hard to find the config issue without getting a proper error message so it would be nice if someone can have a look at this issue.

Env. K8s. ver: 1.26.6 Loki App ver: 2.8.2 I believe you can reproduce this issue if you enable use Federated Token with azure storage type without making any other required configuration for this feature. Thank you for your help in advance.

`panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1782a2d]

goroutine 1 [running]: github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...) /src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:411 github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc0000c2700, {0x25c8db8?, 0x25c8dc0?}) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:414 +0x36d github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc0000c2700) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:359 +0x105 github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc0000c2700, {0xee6b280, 0x3, 0x14}, 0x0) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:343 +0x25b github.com/grafana/loki/pkg/storage/chunk/client/azure.NewBlobStorage(0xc0009a60f0, {0xc00011a6f0?, {0x29fe7e0?, 0xc0003d9f80?}}, {0x34630b8a000?, 0x8bb2c97000?, 0xdf8475800?}) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:197 +0x14c github.com/grafana/loki/pkg/storage.NewObjectClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...) /src/loki/pkg/storage/factory.go:515 +0x985 github.com/grafana/loki/pkg/loki.(*Loki).initUsageReport(0xc0005d6800) /src/loki/pkg/loki/modules.go:1182 +0x247 github.com/grafana/dskit/modules.(*Manager).initModule(0xc000117458, {0x7ffde5172c40, 0x7}, 0x1?, 0xc000567ce0?) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120 +0x20a github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x856d54?, {0xc00074e080, 0x1, 0xc00074f8e0?}) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92 +0xf8 github.com/grafana/loki/pkg/loki.(*Loki).Run(0xc0005d6800, {0xc00065e740?}) /src/loki/pkg/loki/loki.go:457 +0x56 main.main() /src/loki/cmd/loki/main.go:110 +0xe65`

KuDuG avatar Aug 10 '23 12:08 KuDuG

I noticed the same issue today after updating to the latest version of loki-distributed chart because we wanted to use workload identity instead of pod identity. In our case, the issue was quickly solved because I noticed the AZURE_CLIENT_ID on the pod was empty (other injected env variables were valid) and rectified it (ensured that the service account had the right client-id annotation configured). After that there was no issues because the permissions were already configured for the managed identity, and it was an easy switch from pod identity to workload identity.

c3JpbmkK avatar Aug 28 '23 16:08 c3JpbmkK

I have the same problem. @monaka did you find any solution?

RenePinnow avatar Feb 16 '24 12:02 RenePinnow

same here

kamikaze avatar Aug 07 '24 21:08 kamikaze