AKS
AKS copied to clipboard
[BUG] Pods are not getting created with AKS in windows with Cilium as the CNI even though the max pods is set to 250
Describe the bug
- AKS on Windows with Cilium as the CNI. CNI mode being aksbyocni.
- A cluster is brought up on AKS with Windows.
az provider show -n Microsoft.OperationsManagement -o table az provider show -n Microsoft.OperationalInsights -o table
az provider register --namespace Microsoft.OperationsManagement az provider register --namespace Microsoft.OperationalInsights
az group create --name myResourceGroup --location eastus
echo "Please enter the username to use as administrator credentials for Windows Server nodes on your cluster: " && read WINDOWS_USERNAME
echo "Please enter the password to use as administrator credentials for Windows Server nodes on your cluster: " && read WINDOWS_PASSWORD
az aks create
--resource-group myResourceGroup
--name cluster1
--node-count 2
--enable-addons monitoring
--generate-ssh-keys
--windows-admin-username $WINDOWS_USERNAME
--windows-admin-password $WINDOWS_PASSWORD
--vm-set-type VirtualMachineScaleSets
--network-plugin none
az aks nodepool add
--resource-group myResourceGroup
--cluster-name cluster1
--os-type Windows
--name npwin
--node-count 1
Cilium is then installed
helm install cilium cilium/cilium --version 1.13.4
--namespace kube-system
--set aksbyocni.enabled=true
--set nodeinit.enabled=true
To Reproduce
- As soon as Cilium is coming up we see few containers in ContainerCreating state NAMESPACE NAME READY STATUS RESTARTS AGE kube-system ama-logs-4phmr 3/3 Running 0 17m kube-system ama-logs-gdb7j 3/3 Running 0 18m kube-system ama-logs-rs-54d8df865-9bg7n 2/2 Running 0 18m kube-system ama-logs-windows-vdjfk 0/1 ContainerCreating 0 8m38s kube-system cilium-85qt7 1/1 Running 0 2m9s kube-system cilium-node-init-brdl5 1/1 Running 0 2m9s kube-system cilium-node-init-r4prg 1/1 Running 0 2m9s kube-system cilium-nvsb9 1/1 Running 0 2m9s kube-system cilium-operator-fdc5f8984-h78jd 1/1 Running 0 2m9s kube-system cilium-operator-fdc5f8984-rzghw 1/1 Running 0 2m9s kube-system cloud-node-manager-8bv9d 1/1 Running 0 17m kube-system cloud-node-manager-k9pd4 1/1 Running 0 18m kube-system cloud-node-manager-windows-qs5v5 0/1 ContainerCreating 0 9m6s kube-system coredns-autoscaler-69b7556b86-fqlqv 1/1 Running 0 18m kube-system coredns-fb6b9d95f-lbgbh 1/1 Running 0 18m kube-system coredns-fb6b9d95f-wwr86 1/1 Running 0 68s kube-system csi-azuredisk-node-cq7zz 3/3 Running 0 18m kube-system csi-azuredisk-node-pkwjq 3/3 Running 0 17m kube-system csi-azuredisk-node-win-b98wk 0/3 ContainerCreating 0 9m6s kube-system csi-azurefile-node-v9gbm 3/3 Running 0 18m kube-system csi-azurefile-node-win-jbv5k 0/3 ContainerCreating 0 9m6s kube-system csi-azurefile-node-zm4ln 3/3 Running 0 17m kube-system konnectivity-agent-f6b459979-jm8v7 1/1 Running 0 3m31s kube-system konnectivity-agent-f6b459979-ktzx5 1/1 Running 0 3m41s kube-system kube-proxy-8hdwv 1/1 Running 0 17m kube-system kube-proxy-d4msp 1/1 Running 0 18m kube-system metrics-server-5dd7f7965f-kbz92 2/2 Running 0 73s kube-system metrics-server-5dd7f7965f-w2vkp 2/2 Running 0 73s
kubectl get pods -A | wc -l 30
###############
On Further triaging we can see that the pods are complaining of IP addresses not being available.
Warning FailedCreatePodSandBox 9m41s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4e6946968e241b72da5c1523541f0466a5f3475b6f3083fac27660e59e60051e": plugin type="azure-vnet" failed (add): IPAM Invoker Add failed with error: Failed to allocate pool: Failed to delegate: Failed to allocate address: No available addresses
###############
From /etc/default/kubelet on the node we can see that max pods are set to 250.
KUBELET_FLAGS=--address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-provider=external --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --container-log-max-size=50M --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%,pid.available<2000 --feature-gates=CSIMigration=true,CSIMigrationAzureDisk=true,CSIMigrationAzureFile=true,DelegateFSGroupToCSIDriver=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --keep-terminated-pod-volumes=false --kube-reserved=cpu=100m,memory=1638Mi,pid=1000 --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=250 --node-status-update-frequency=10s --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.6 --pod-manifest-path=/etc/kubernetes/manifests --protect-kernel-defaults=true --read-only-port=0 --rotate-certificates=true --streaming-connection-idle-timeout=4h --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --tls-
###################
{ "cniVersion": "0.3.1", "name": "cilium", "type": "cilium-cni", "enable-debug": false, "log-file": "/var/run/cilium/cilium-cni.log" }
Expected behavior All pods should be up and running for further tests to be done.
Environment (please complete the following information):
- CLI Version 2.49.0
- Kubernetes version 1.25.6
Additional context Add any other context about the problem here.
### Tasks
@amitmavgupta its currently not available for windows nodepool Azure CNI powered by Cilium currently has the following limitations: Available only for Linux and not for Windows.
https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium#limitations
@abarqawi Thanks. Any timeline for this support to be available?
@amitmavgupta - Could you provide additional detail regarding your requirement for Windows support?
I would also like this for windows node pools. Requirement is just that it supports them. Currently if a cluster has windows nodes you can't move to cilium. Would just like to know if it's on a future roadmap or is never going to happen.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs within 7 days of this comment. @allyford
This issue will now be closed because it hasn't had any activity for 7 days after stale. @amitmavgupta feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.