keda
keda copied to clipboard
Context Canceled with Azure Managed Identity
Upgraded our cluster aad-pod-identity helm chart to 4.1.10 and we are now getting the following error when trying to use an Azure Queue Scaled Job with Keda:
2022-06-27T10:49:21-07:00 1.6563521617735345e+09 ERROR azure_queue_scaler error) {"error": "-> github.com/Azure/azure-pipeline-go/pipeline.NewError, /go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:157\nHTTP request failed\n\nGet \"https://usdevstorage.queue.core.windows.net/queue1/messages?numofmessages=32&peekonly=true&timeout=61\": context canceled\n"}
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scalers.(*azureQueueScaler).IsActive
2022-06-27T10:49:21-07:00 /workspace/pkg/scalers/azure_queue_scaler.go:160
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).getScaledJobMetrics
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/cache/scalers_cache.go:257
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledJobActive
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/cache/scalers_cache.go:124
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/scale_handler.go:286
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/scale_handler.go:149
2022-06-27T10:49:21-07:00 1.656352161774509e+09 ERROR azure_queue_scaler error) {"error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com%2F\": context canceled"}
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scalers.(*azureQueueScaler).IsActive
2022-06-27T10:49:21-07:00 /workspace/pkg/scalers/azure_queue_scaler.go:160
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).getScaledJobMetrics
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/cache/scalers_cache.go:262
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledJobActive
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/cache/scalers_cache.go:124
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/scale_handler.go:286
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
2022-06-27T10:49:21-07:00 /workspace/pkg/scaling/scale_handler.go:149
2022-06-27T10:49:21-07:00 1.65635216177456e+09 DEBUG scalehandler Error getting scaler.IsActive, but continue {"ScaledJob": "azure-queue-scaledjob", "Scaler": "cache.ScalerBuilder:", "Error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com%2F\": context canceled"}
This same managed identity worked with the same permissions before until we upgraded and it seems like it might be a permission timeout but not too sure.
Expected Behavior
The Keda operator should be able to read the Azure queue and scaled up our application pods to handle the objects in the queue.
Actual Behavior
Keda operator fails with a context canceled error when running ScaledJob.
Steps to Reproduce the Problem
- Create Azure managed identity with queue contributer permissions to the storage account
- deploy keda helm chart with the podIdentity.activeDirectory.identity set on an aks cluster
- create azure queue scaled job and add one item to queue
Specifications
- KEDA Version: 2.7.2
- Platform & Version: AKS
- Kubernetes Version: 1.22
- Scaler(s): Azure Storage Queue Scaler
What happens when you downgrade the aad-pod-identity helm chart?
We have seen similar flavoured but different issues since last Friday (24. June 2022) unrelated to Azure Managed Identities
So when we use aad-pod-identity helm chart at version 4.1.8, this works fine but the issue began with chart version 4.1.10.
Others are also reporting time out issues at the token retrieval with pod identity when using a similar setup as KEDA with azure https://github.com/Azure/aad-pod-identity/issues/1287
Just tested a downgrade of aad-pod-identity helm chart and still getting same error.
Just assume for a second this error is unrelated to the Helm chart. When did you see this error for the first time? I'm trying to pinpoint the source of the issue and somehow I get the feeling that Azure might be the source.
Are there any errors in aad-pod-identity pods?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.