AKS
AKS copied to clipboard
[BUG] Not able to create replica set in west us 3 region suddenly with error -failed calling webhook "mutation.azure-workload-identity.io"
Describe the bug In our CI/CD pipeline we suddenly noticed today all helm deployments are failing in west us 3 region (but not in west europe). Helm simply fails saying 'timeout waiting' without any further info. Upon adding debug flag and checking replica set logs, we noticed the replica set is not getting created. The error is as follows...
Warning FailedCreate 61s (x15 over 4m2s) replicaset-controller Error creating: Internal error occurred: failed calling webhook "mutation.azure-workload-identity.io": failed to call webhook: Post "https://azure-wi-webhook-webhook-service.kube-system.svc:443/mutate-v1-pod?timeout=10s": context deadline exceeded
I can confirm that if we remove below items from the helm chart, the deployment goes through (actual app fails though as it cant find the identity which is fine)
labels:
azure.workload.identity/use: "true"
spec:
serviceAccountName: "${SERVICE_ACCOUNT_NAME}"
Additional context This is a private AKS cluster (private endpoint to API server) with workload identity enabled. We have validated that there is no update in the system in last 2 days.
What we also observed in west us 3 these env variables are not getting injected anymore automatically even after disabling & enabling the workload identity. west europe region still have these values intact even after new deployment.
We're also seeing this over the past 12 hours in our Staging environment, did you find a solution @sanjaydebnath
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
I am also having the exact same issue
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@sanjaydebnath Is this still happening? This can happen if workload identity or tunnel pod not running correctly. Could you check whether any pod in kube-system is not in Running state? If still impacting, suggest to create a support ticket so that we can take a look future.
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. @karataliu
This issue will now be closed because it hasn't had any activity for 7 days after stale. sanjaydebnath feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.