keda icon indicating copy to clipboard operation
keda copied to clipboard

Context Canceled with Azure Managed Identity

Open alenesho116 opened this issue 2 years ago • 7 comments

Upgraded our cluster aad-pod-identity helm chart to 4.1.10 and we are now getting the following error when trying to use an Azure Queue Scaled Job with Keda:

2022-06-27T10:49:21-07:00 1.6563521617735345e+09	ERROR	azure_queue_scaler	error)	{"error": "-> github.com/Azure/azure-pipeline-go/pipeline.NewError, /go/pkg/mod/github.com/!azure/[email protected]/pipeline/error.go:157\nHTTP request failed\n\nGet \"https://usdevstorage.queue.core.windows.net/queue1/messages?numofmessages=32&peekonly=true&timeout=61\": context canceled\n"}
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scalers.(*azureQueueScaler).IsActive
2022-06-27T10:49:21-07:00 	/workspace/pkg/scalers/azure_queue_scaler.go:160
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).getScaledJobMetrics
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/cache/scalers_cache.go:257
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledJobActive
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/cache/scalers_cache.go:124
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/scale_handler.go:286
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/scale_handler.go:149
2022-06-27T10:49:21-07:00 1.656352161774509e+09	ERROR	azure_queue_scaler	error)	{"error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com%2F\": context canceled"}
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scalers.(*azureQueueScaler).IsActive
2022-06-27T10:49:21-07:00 	/workspace/pkg/scalers/azure_queue_scaler.go:160
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).getScaledJobMetrics
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/cache/scalers_cache.go:262
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledJobActive
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/cache/scalers_cache.go:124
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/scale_handler.go:286
2022-06-27T10:49:21-07:00 github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
2022-06-27T10:49:21-07:00 	/workspace/pkg/scaling/scale_handler.go:149
2022-06-27T10:49:21-07:00 1.65635216177456e+09	DEBUG	scalehandler	Error getting scaler.IsActive, but continue	{"ScaledJob": "azure-queue-scaledjob", "Scaler": "cache.ScalerBuilder:", "Error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com%2F\": context canceled"}

This same managed identity worked with the same permissions before until we upgraded and it seems like it might be a permission timeout but not too sure.

Expected Behavior

The Keda operator should be able to read the Azure queue and scaled up our application pods to handle the objects in the queue.

Actual Behavior

Keda operator fails with a context canceled error when running ScaledJob.

Steps to Reproduce the Problem

  1. Create Azure managed identity with queue contributer permissions to the storage account
  2. deploy keda helm chart with the podIdentity.activeDirectory.identity set on an aks cluster
  3. create azure queue scaled job and add one item to queue

Specifications

  • KEDA Version: 2.7.2
  • Platform & Version: AKS
  • Kubernetes Version: 1.22
  • Scaler(s): Azure Storage Queue Scaler

alenesho116 avatar Jun 27 '22 18:06 alenesho116

What happens when you downgrade the aad-pod-identity helm chart?

We have seen similar flavoured but different issues since last Friday (24. June 2022) unrelated to Azure Managed Identities

brainslush avatar Jun 29 '22 07:06 brainslush

So when we use aad-pod-identity helm chart at version 4.1.8, this works fine but the issue began with chart version 4.1.10.

Others are also reporting time out issues at the token retrieval with pod identity when using a similar setup as KEDA with azure https://github.com/Azure/aad-pod-identity/issues/1287

alenesho116 avatar Jun 29 '22 16:06 alenesho116

Just tested a downgrade of aad-pod-identity helm chart and still getting same error.

alenesho116 avatar Jun 29 '22 22:06 alenesho116

Just assume for a second this error is unrelated to the Helm chart. When did you see this error for the first time? I'm trying to pinpoint the source of the issue and somehow I get the feeling that Azure might be the source.

brainslush avatar Jun 30 '22 08:06 brainslush

Are there any errors in aad-pod-identity pods?

JorTurFer avatar Jul 05 '22 12:07 JorTurFer

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 03 '22 13:09 stale[bot]

This issue has been automatically closed due to inactivity.

stale[bot] avatar Sep 10 '22 13:09 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 10 '22 20:11 stale[bot]

This issue has been automatically closed due to inactivity.

stale[bot] avatar Nov 17 '22 21:11 stale[bot]