loki
loki copied to clipboard
Crashes read/write/backend using Azure blob with AAD workload identity.
Describe the bug
I tried to use Azure blob with AAD workflow identity. I got errors on read/write/backend.
To Reproduce
- Deploys loki by ArgoCD App with these parameters. ( You will reproduce it without ArgoCD. )
project: default
source:
repoURL: 'https://grafana.github.io/helm-charts'
targetRevision: 5.8.9
helm:
parameters:
- name: backend.persistence.enableStatefulSetAutoDeletePVC
value: 'false'
- name: loki.podLabels.azure\.workload\.identity/use
value: 'true'
forceString: true
- name: loki.storage.type
value: azure
- name: loki.storage.azure.accountName
value: {{snip}}
- name: loki.storage.azure.useFederatedToken
value: 'true'
- name: minio.enabled
value: 'false'
- name: monitoring.selfMonitoring.grafanaAgent.installOperator
value: 'false'
- name: read.persistence.enableStatefulSetAutoDeletePVC
value: 'false'
chart: loki
destination:
server: 'https://kubernetes.default.svc'
namespace: loki
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
-
- Just watching.
- Componets are in
CrashLoopBackoff
.
Expected behavior
All components are booted up and moving to Running
.
Environment:
- Infrastructure: Kubernetes
- Deployment tool: helm
Screenshots, Promtail config, or terminal output
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1782a2d]
goroutine 1 [running]:
github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...)
/src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:411
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc0006a8800, {0x25c8db8?, 0x25c8dc0?})
/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:414 +0x36d
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc0006a8800)
/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:359 +0x105
github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc0006a8800, {0xee6b280, 0x3, 0x14}, 0x0)
/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:343 +0x25b
github.com/grafana/loki/pkg/storage/chunk/client/azure.NewBlobStorage(0xc00027ad20, {0xc0004bc348?, {0x29fe7e0?, 0xc0004b76e0?}}, {0x0?, 0x0?, 0x0?})
/src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:197 +0x14c
github.com/grafana/loki/pkg/storage.NewObjectClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...)
/src/loki/pkg/storage/factory.go:515 +0x985
github.com/grafana/loki/pkg/storage.NewChunkClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...)
/src/loki/pkg/storage/factory.go:340 +0x6b4
github.com/grafana/loki/pkg/storage.(*store).chunkClientForPeriod(0xc000314600, {{0x17e466f3400}, {0xc000a30cd0, 0xe}, {0xc000a30ca8, 0x5}, {0xc000a30cc0, 0x3}, {{0xc000a30c80, 0xb}, ...}, ...})
/src/loki/pkg/storage/store.go:185 +0x27c
github.com/grafana/loki/pkg/storage.(*store).init(0xc000314600)
/src/loki/pkg/storage/store.go:155 +0xf8
github.com/grafana/loki/pkg/storage.NewStore({{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, {{{0x0}, 0x4000000000000000, ...}, ...}, ...}, ...)
/src/loki/pkg/storage/store.go:147 +0xa3b
github.com/grafana/loki/pkg/loki.(*Loki).initStore(0xc000948000)
/src/loki/pkg/loki/modules.go:655 +0x598
github.com/grafana/dskit/modules.(*Manager).initModule(0xc0004b9080, {0x7ffeef0c550b, 0x5}, 0x1?, 0xc000635c20?)
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120 +0x20a
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x856d54?, {0xc0008e4670, 0x1, 0xc0008e4940?})
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92 +0xf8
github.com/grafana/loki/pkg/loki.(*Loki).Run(0xc000948000, {0xc0008e8980?})
/src/loki/pkg/loki/loki.go:457 +0x56
main.main()
/src/loki/cmd/loki/main.go:110 +0xe65
Additional info:
There has a Cert-manager with AAD workload identity in the same AKS cluster. It works with no trouble. So I believe that base settings are done.
I have no issues writing to an Azure storage account with AAD Workload Identity with the configuration below. Are annotations configured on the serviceAccount? I would also double check the federated credentials/role assignments is properly configured on Azure.
loki:
serviceAccount:
annotations:
azure.workload.identity/tenant-id: "<tenant-id-for-azure-account>"
azure.workload.identity/client-id: "<client-id-for-managed-identity>"
loki:
storage:
type: azure
azure:
accountName: <storage-account-name>
accountKey: null
useManagedIdentity: false
useFederatedToken: true
I have the same issue with one of my subscriptions. As soon as I enable the use Federated Token this issue appears. So I have two different subscriptions a private and a company. My private setup is way-less complex than the company subscription. If I configure this in my private subscription it's working correctly.
For some reason the company subscription I got a runtime error. I used workload identity with other stuff as well without any issue so even if I believe my configuration is correct I have to admit that the issue occurrence is related to the config because if I enable the use Federated Token feature without configuring properly this issue also appears in my private subscription where it worked if a configured properly. However throwing an invalid memory address or nil pointer dereference, segmentation violation runtime error. clearly an application-side issue and hard to find the config issue without getting a proper error message so it would be nice if someone can have a look at this issue.
Env. K8s. ver: 1.26.6 Loki App ver: 2.8.2 I believe you can reproduce this issue if you enable use Federated Token with azure storage type without making any other required configuration for this feature. Thank you for your help in advance.
`panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1782a2d]
goroutine 1 [running]: github.com/Azure/go-autorest/autorest/adal.(*ServicePrincipalToken).SetCustomRefreshFunc(...) /src/loki/vendor/github.com/Azure/go-autorest/autorest/adal/token.go:411 github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getServicePrincipalToken(0xc0000c2700, {0x25c8db8?, 0x25c8dc0?}) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:414 +0x36d github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).getOAuthToken(0xc0000c2700) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:359 +0x105 github.com/grafana/loki/pkg/storage/chunk/client/azure.(*BlobStorage).newPipeline(0xc0000c2700, {0xee6b280, 0x3, 0x14}, 0x0) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:343 +0x25b github.com/grafana/loki/pkg/storage/chunk/client/azure.NewBlobStorage(0xc0009a60f0, {0xc00011a6f0?, {0x29fe7e0?, 0xc0003d9f80?}}, {0x34630b8a000?, 0x8bb2c97000?, 0xdf8475800?}) /src/loki/pkg/storage/chunk/client/azure/blob_storage_client.go:197 +0x14c github.com/grafana/loki/pkg/storage.NewObjectClient({_, _}, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...}, ...) /src/loki/pkg/storage/factory.go:515 +0x985 github.com/grafana/loki/pkg/loki.(*Loki).initUsageReport(0xc0005d6800) /src/loki/pkg/loki/modules.go:1182 +0x247 github.com/grafana/dskit/modules.(*Manager).initModule(0xc000117458, {0x7ffde5172c40, 0x7}, 0x1?, 0xc000567ce0?) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:120 +0x20a github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x856d54?, {0xc00074e080, 0x1, 0xc00074f8e0?}) /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:92 +0xf8 github.com/grafana/loki/pkg/loki.(*Loki).Run(0xc0005d6800, {0xc00065e740?}) /src/loki/pkg/loki/loki.go:457 +0x56 main.main() /src/loki/cmd/loki/main.go:110 +0xe65`
I noticed the same issue today after updating to the latest version of loki-distributed chart because we wanted to use workload identity instead of pod identity. In our case, the issue was quickly solved because I noticed the AZURE_CLIENT_ID
on the pod was empty (other injected env variables were valid) and rectified it (ensured that the service account had the right client-id annotation configured). After that there was no issues because the permissions were already configured for the managed identity, and it was an easy switch from pod identity to workload identity.
I have the same problem. @monaka did you find any solution?
same here