aks-hybrid icon indicating copy to clipboard operation
aks-hybrid copied to clipboard

[BUG] Error connecting to AKS-HCI service host

Open EkeleAsonye opened this issue 3 years ago • 5 comments

Describe the bug Issue was reported by a partner, Dell Corp.

When I RDP to one of the nodes in my Azure Stack HCI cluster, I ran the Get-AksHciCluster command and got the error that the established connection failed because host failed to respond.

Probing further, an operator can not access the mgmt cluster using the kubeconfig-mgmt. The commands will fail with an error like: Unable to connect to the server: dial tcp 172.168.10.0:6443. where 172.168.10.0 is the IP of the control plane.

Certain powershell commands that use the kubeconfig-mgmt will fail with an error similar to : Unable to connect to the server: dial tcp 172.168.10.0:6443. where 172.168.10.0 is the IP of the control plane.

image

Additional context The kube-vip pod that advertises the control plane IP may be down. The pod will restart and the k8s API server may be available intermittently until it crashes again.

EkeleAsonye avatar Aug 16 '21 17:08 EkeleAsonye

I had the same issue some weeks ago. I've noticed out of memory errors at the management cluster VM: image

Same root cause in your case?

The mgmt VM gets 8GB by default, but Hyper-V shows a memory demand of 22GB. I've set the mgmt VM in Hyper-V to 32GB and everything works fine for the last 3 weeks. I've reported the issue to the Aks-Hci PMs, may there is already an open bug.

Elektronenvolt avatar Aug 16 '21 20:08 Elektronenvolt

@Elektronenvolt what version of AksHci are you on?

The management cluster should not need that much memory.

zawachte avatar Aug 17 '21 16:08 zawachte

@zawachte-msft I've seen the OOM issue the first time at the 06/2021 release, two weeks after initial setup. Right now I'm running the latest July release. image No OOM issues so far, but I've set the mgmt cluster VM to 32 GB after initial setup.

Elektronenvolt avatar Aug 17 '21 17:08 Elektronenvolt

@Elektronenvolt - Since Aug 2021, we've had multiple new releases of AKS-HCI. Can you please try with the latest version and let us know if you still hit this issue?

abhilashaagarwala avatar Oct 20 '22 18:10 abhilashaagarwala

@abhilashaagarwala - I didn't see the issue with any 2022 release anymore, but I didn't watch out for OOM errors. I'll verify it with the September 2022 release and let you know.

Elektronenvolt avatar Oct 24 '22 03:10 Elektronenvolt