kubernetes icon indicating copy to clipboard operation
kubernetes copied to clipboard

kubelet fail to start on Windows since v1.31.0

Open tt-kkaiser opened this issue 1 year ago • 7 comments

What happened?

Since I upgraded my kubernetes cluster from v1.30.4 to v1.31.0, kubelet fails to restart on Windows.

The error messages in the logs are:

E0828 03:15:28.934935    5404 server.go:102] "Failed to listen to socket while starting device plugin registry" err="listen unix C:\\var\\lib\\kubelet\\device-plugins\\kubelet.sock: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted."
E0828 03:15:28.934935    5404 kubelet.go:1566] "Failed to start ContainerManager" err="listen unix C:\\var\\lib\\kubelet\\device-plugins\\kubelet.sock: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted."

What did you expect to happen?

I expected kubelet to start even if the kubelet.sock file exists, this was the behavior in the previous versions.

How can we reproduce it (as minimally and precisely as possible)?

  1. Setup a Windows Kubernetes Node
  2. Start kubelet.exe
  3. Stop kubelet.exe
  4. Start kubelet.exe again and watch it fail due to the kubelet.sock file already existing.

Anything else we need to know?

I think this Issue has to do with the commit 4060ee6 where socket files are not removed anymore causing the start to fail because the file already exists.

Kubernetes version

$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.0

Cloud provider

On Premise / No Cloud Provider

OS version

C:\> wmic os get Caption,Version,BuildNumber,OSArchitecture
BuildNumber  Caption                                              OSArchitecture  Version
20348        Microsoft Windows Server 2022 Datacenter Evaluation  64-bit          10.0.20348

Install tools

No response

Container runtime (CRI) and version (if applicable)

Containerd Version: 1.7.20

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

tt-kkaiser avatar Aug 28 '24 10:08 tt-kkaiser

/sig windows

tt-kkaiser avatar Aug 28 '24 10:08 tt-kkaiser

/area kubelet

tt-kkaiser avatar Aug 28 '24 10:08 tt-kkaiser

/cc

ffromani avatar Aug 28 '24 11:08 ffromani

/sig node /triage accepted /priority critical-urgent

ffromani avatar Aug 28 '24 11:08 ffromani

I was able to reproduce this

jsturtevant avatar Aug 28 '24 16:08 jsturtevant

/assign @jsturtevant

Given the severity, assigning to you.

kannon92 avatar Aug 28 '24 17:08 kannon92

reverting https://github.com/kubernetes/kubernetes/commit/4060ee60c1d2e5ba1fba1f8729adfc211cee1b6f fixes the issue

jsturtevant avatar Aug 28 '24 17:08 jsturtevant

The reason for the issue is that kubernetes 1.31rc used go 1.23rc, whilst the release uses 1.22.5: https://github.com/kubernetes/kubernetes/blob/eebc897e4fd8bf26a69d322c8dbcdf4da475934e/.go-version / https://github.com/kubernetes/kubernetes/pull/126330.

The fix for os.Stat has been implemented in go 1.23.0 (and its rc) and has not been backported to 1.22.5 / 1.22.6. Commit of fix: https://github.com/golang/go/commit/628b1015b972eabcc0a678ab69a74601239c40a4

So updating kubernetes back to go 1.23.0 fixes this issue and resolves the TODO in CleanupPluginDirectory

tt-kkaiser avatar Aug 29 '24 11:08 tt-kkaiser

/reopen

The PR needs to be cherry-picked and released before we can actually close this.

kannon92 avatar Sep 03 '24 17:09 kannon92

@kannon92: Reopened this issue.

In response to this:

/reopen

The PR needs to be cherry-picked and released before we can actually close this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 03 '24 17:09 k8s-ci-robot

https://github.com/kubernetes/kubernetes/pull/127100

jsturtevant avatar Sep 03 '24 20:09 jsturtevant

/close

Cherry pick was merged so this should be good now. This will be in v.1.31.1 when that goes out.

kannon92 avatar Sep 04 '24 13:09 kannon92

@kannon92: Closing this issue.

In response to this:

/close

Cherry pick was merged so this should be good now. This will be in v.1.31.1 when that goes out.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 04 '24 13:09 k8s-ci-robot