AKS
AKS copied to clipboard
[BUG] Command Invoke in private cluster creates pods that are never deleted if the command(s) doesn't go through
Describe the bug Create a Private Cluster in AKS following this document https://learn.microsoft.com/en-gb/azure/aks/access-private-cluster?tabs=azure-portal
To Reproduce
If some of the commands error out like as can be seen below ( a repo had not been added which was the user's mistake)
az aks command invoke \
--resource-group privatecluster \
--name privatecluster \
--command "helm install cilium cilium/cilium --version 1.13.4 --namespace kube-system --set aksbyocni.enabled=true --set nodeinit.enabled=true"
command started at 2023-10-04 12:54:50+00:00, finished at 2023-10-04 12:54:51+00:00 with exitcode=1
Error: INSTALLATION FAILED: repo cilium not found
Post fixing this error if the user were to issue ( see command below to fetch the pods state), you can notice the few pods are in Error state as the commands were not executed. This is not momentarily but the age of the pods shows that they are never deleted.
az aks command invoke \
--resource-group privatecluster \
--name privatecluster \
--command "kubectl get pods -A -o wide"
command started at 2023-10-04 13:14:31+00:00, finished at 2023-10-04 13:14:32+00:00 with exitcode=0
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
aks-command command-0679b33e1a164f34b3e17580607a1dd6 0/1 Completed 0 20m 10.244.2.3 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-1464a85e6b8844adb9bdc8a6f60d1336 0/1 Completed 0 41s 10.0.2.213 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-278f23c16d45436b8ceca12663b6d196 0/1 Completed 0 16m 10.0.2.97 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-32a4a1e90c2748d09ee178259795917d 0/1 Error 0 19m 10.244.2.5 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-57870e11e1094f66aa1fb44466638a65 0/1 Error 0 19m 10.244.2.4 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-7d63767d93024d04ac83198d6a24695d 0/1 Completed 0 26s 10.0.2.209 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-8acea6b0a6af4bb189a3bb7bcf5cff5e 1/1 Running 0 2s 10.0.2.198 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-b77e7fcb9fac44e8a8c53e8e198bdfe1 0/1 Completed 0 18m 10.244.2.6 aks-nodepool1-26012924-vmss000002 <none> <none>
aks-command command-c627cfe313cc4a7c974a322e53165d25 0/1 Completed 0 18m 10.244.2.7 aks-nodepool1-26012924-vmss000002 <none> <none>
kube-system azure-ip-masq-agent-lbjxj 1/1 Running 0 29m 10.224.0.7 aks-nodepool1-26012924-vmss000001 <none> <none>
kube-system azure-ip-masq-agent-r4frx 1/1 Running 0 29m 10.224.0.5 aks-nodepool1-26012924-vmss000000 <none> <none>
kube-system azure-ip-masq-agent-wph62 1/1 Running 0 29m 10.224.0.6 aks-nodepool1-26012924-vmss000002 <none> <none>
Expected behavior These pods should be in Completed state else if the user were to use this feature and not eat up into the pod count that is supported via a particular Network Plugin.
Screenshots
Environment (please complete the following information):
- CLI Version 2.53.0
- Kubernetes version 1.26.6
Additional context
@wedaly @tamilmani1989
I'm not familiar with the az aks command invoke, but I'd guess it keeps the failed pods so a user could inspect the logs? I don't believe pods in Error state count towards the max pods limit on a node.
Ah yes, that makes sense Will. Didn't think about that.
Also, good to know that it doesn't count towards the max pods limit.
@wedaly do we still need to track this just to make sure that someone can see why these pods are left hanging or it's safe to ignore?
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads