kubeadm icon indicating copy to clipboard operation
kubeadm copied to clipboard

tracking issue for Windows support

Open neolit123 opened this issue 5 years ago • 28 comments

kubeadm currently does not work on Windows. there are plans to try to get this back into shape in the 1.15 cycle.

kubernetes/enhancements tracking issue: kubernetes/enhancements#995

KEP was added here: kubernetes/enhancements#994


beta graduation:

  • [x] split kubelet flags per OS assigned: @gab-satchi PR: https://github.com/kubernetes/kubernetes/pull/88287

~- [ ] upgrades~ upgrades were delegated to documentation and having scripts for the process is not really needed.

  • [x] add remaining scripts to sig-windows-tools assigned: @benmoss PR: https://github.com/kubernetes-sigs/sig-windows-tools/pull/34

  • [x] set up e2e tests assigned: @benmoss @neolit123 PRs: https://github.com/kubernetes/test-infra/pull/16718 https://github.com/kubernetes-sigs/sig-windows-tools/pull/39 https://k8s-testgrid.appspot.com/sig-windows#kubeadm-windows-gcp-k8s-stable status: debugging e2e failures / flakes.

  • [x] finalize the documentation assigned: @benmoss PR: https://github.com/kubernetes/website/pull/19217 status: merged


alpha graduation:

as list of cleanup changes that we can do regardless:

  • [x] fix Windows related paths and defaults assigned: @ksubrmnn
    PR: https://github.com/kubernetes/kubernetes/pull/77710 PR: https://github.com/kubernetes/kubernetes/pull/78053

  • [x] kube-proxy retry mechanic assigned: @ksubrmnn PR: https://github.com/kubernetes/kubernetes/pull/78612

  • [x] flanneld should support a flag for its config assigned: @neolit123 PR: https://github.com/coreos/flannel/pull/1136

  • [x] docs assigned @ksubrmnn PR: https://github.com/kubernetes/website/pull/14644

  • [x] install script assigned @ksubrmnn PR: https://github.com/kubernetes-sigs/sig-windows-tools/pull/1 PR: TODO


side work:

  • [ ] fix wrongly defaulted kubelet flags on windows: PR: TODO https://github.com/kubernetes/kubeadm/issues/2967

  • [ ] add preflight checks (if needed) assigned: @benmoss PR: TODO possibly only support 1803+? also see https://github.com/kubernetes/kubernetes/blob/0f93328c7a051e28a097270daaf7a7ff6f90bae0/cmd/kubeadm/app/util/system/types_windows.go

  • [x] don't depend on powershell calls both kubeadm and pkg/util/initsystem depend on powershell. these should be system calls instead. assigned: @ksubrmnn PR: https://github.com/kubernetes/kubernetes/pull/77989 PR: https://github.com/kubernetes/kubernetes/pull/78189 PR: TODO system checks still have this https://github.com/kubernetes/kubernetes/blob/0f93328c7a051e28a097270daaf7a7ff6f90bae0/cmd/kubeadm/app/util/system/types_windows.go

  • [x] fix the symbolic links that are currently required in https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/kubeadm/scripts/PrepareNode.ps1#L65 see https://github.com/kubernetes/kubeadm/issues/2330 https://github.com/kubernetes/kubeadm/issues/2419


/kind feature /area ecosystem /priority important-longterm /assign

cc @michmike @PatrickLang

neolit123 avatar Feb 06 '19 18:02 neolit123

WIP google doc for ideas: https://docs.google.com/document/d/1yaT7K85qMvZD7Q-ejWHBko1fgGaeGtjGLEZZ_Bz63VA/edit?usp=sharing

neolit123 avatar Apr 03 '19 13:04 neolit123

kubernetes/enhancements tracking issue: https://github.com/kubernetes/enhancements/issues/995

KEP was added here: https://github.com/kubernetes/enhancements/pull/994

neolit123 avatar Apr 24 '19 19:04 neolit123

update the OP with:

as list of cleanup changes that we can do regardless:

neolit123 avatar Apr 26 '19 17:04 neolit123

I don't see any preflight checks that are failing or not appropriate for Windows:

PS C:\> .\winsw\join.ps1
I0508 14:27:14.679350    1320 join.go:364] [preflight] found NodeName empty; using OS hostname as NodeName
I0508 14:27:14.681294    1320 initconfiguration.go:105] detected and using CRI socket: tcp://localhost:2375
[preflight] Running pre-flight checks
I0508 14:27:14.690354    1320 preflight.go:90] [preflight] Running general checks
I0508 14:27:14.952085    1320 checks.go:254] validating the existence and emptiness of directory \etc\kubernetes\manifests
I0508 14:27:14.953111    1320 checks.go:292] validating the existence of file \etc\kubernetes\kubelet.conf
I0508 14:27:14.961114    1320 checks.go:292] validating the existence of file \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:14.963083    1320 checks.go:105] validating the container runtime
I0508 14:27:15.124971    1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:16.139511    1320 checks.go:524] running all checks
I0508 14:27:16.655173    1320 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0508 14:27:16.671956    1320 checks.go:622] validating kubelet version
I0508 14:27:16.834297    1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:17.475948    1320 checks.go:209] validating availability of port 10250
I0508 14:27:17.476979    1320 checks.go:292] validating the existence of file C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.485021    1320 checks.go:439] validating if the connectivity type is via proxy or direct
I0508 14:27:17.487030    1320 join.go:426] [preflight] Discovering cluster-info
I0508 14:27:17.488914    1320 token.go:199] [discovery] Trying to connect to API Server "192.168.79.131:6443"
I0508 14:27:17.491183    1320 token.go:74] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.79.131:6443"
I0508 14:27:17.512331    1320 token.go:140] [discovery] Requesting info from "https://192.168.79.131:6443" again to validate TLS against the pinned public key
I0508 14:27:17.529788    1320 token.go:163] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.79.131:6443"
I0508 14:27:17.532935    1320 token.go:205] [discovery] Successfully established connection with API Server "192.168.79.131:6443"
I0508 14:27:17.534895    1320 join.go:440] [preflight] Fetching init configuration
I0508 14:27:17.535888    1320 join.go:473] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0508 14:27:17.571275    1320 interface.go:278] Looking for system interface with a global IPv4 address
I0508 14:27:17.572239    1320 interface.go:196] Interface Ethernet0 is up
I0508 14:27:17.585881    1320 interface.go:302] Skipping: no address family match for "fe80::a977:1755:66ff:8b87" on interface "Ethernet0".
I0508 14:27:17.586472    1320 interface.go:310] Found global unicast address "192.168.79.128" on interface "Ethernet0".
I0508 14:27:17.587191    1320 preflight.go:101] [preflight] Running configuration dependant checks
I0508 14:27:17.594267    1320 controlplaneprepare.go:207] [download-certs] Skipping certs download
I0508 14:27:17.595830    1320 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:17.604244    1320 kubelet.go:113] [kubelet-start] writing CA certificate at C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.766276    1320 kubelet.go:131] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "\\var\\lib\\kubelet\\config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "\\var\\lib\\kubelet\\kubeadm-flags.env"
I0508 14:27:18.466047    1320 kubelet.go:148] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connectex: No connection could be made because the target machine actively refused it..
I0508 14:28:43.240082    1320 kubelet.go:166] [kubelet-start] preserving the crisocket information for the node
I0508 14:28:43.241154    1320 patchnode.go:30] [patchnode] Uploading the CRI Socket information "tcp://localhost:2375" to the Node API object "win-vb8d2n40slh" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

benmoss avatar May 08 '19 18:05 benmoss

@benmoss did you start the kubelet service using the Start-Servicefrontend instead of sc?

neolit123 avatar May 08 '19 18:05 neolit123

also, did you had to apply the \ -> c:\ fix i did in \etc\kubernetes\kubelet.conf

neolit123 avatar May 08 '19 18:05 neolit123

@benmoss Can you share .\winsw\join.ps1?

ksubrmnn avatar May 08 '19 18:05 ksubrmnn

I am using WinSW to wrap kubelet.exe as a Service. I really like WinSW as a service wrapper, it would be my vote rather than using the --windows-service flag.

https://github.com/benmoss/kubeadm-windows/blob/master/join.ps1 https://github.com/benmoss/kubeadm-windows/blob/master/kubelet.xml

To install the service you just need to run kubelet.exe install from that directory. The way WinSW works is you download the WinSW binary, rename it to the name of the service, and put it in the same directory as the corresponding xml config file. kubelet.exe install then registers it as a Windows service.

benmoss avatar May 08 '19 18:05 benmoss

i think it might be a case where sc does something differently. i will try the different options.

neolit123 avatar May 08 '19 18:05 neolit123

And no, I didn't have to fix the paths in /etc/kubernetes/kubelet.conf. The only path problem I'm running into is that kubelet is joining paths to /etc/kubernetes/pki/ca.crt incorrectly. It errors with

F0508 14:27:19.857413    4916 server.go:251] unable to load client CA file C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: open C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: The system cannot find the path specified.

I have been working around that by just copying /etc into /var/lib/kubelet/ but that's obviously not right.

benmoss avatar May 08 '19 18:05 benmoss

updated OP with latest PRs merged. for 1.15 (alpha) remaining items are install script and docs.

EDIT: looks like the docs and script will miss the 1.15 release deadlines.

neolit123 avatar Jun 10 '19 14:06 neolit123

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jun 21 '20 18:06 fejta-bot

/remove-lifecycle stale

neolit123 avatar Jun 21 '20 18:06 neolit123

SIG-Windows Traige meeting

We will be tracking it here https://github.com/kubernetes/enhancements/issues/995 Closing this for now

immuzz avatar Aug 27 '20 16:08 immuzz

/close

immuzz avatar Aug 27 '20 16:08 immuzz

@immuzz: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 27 '20 16:08 k8s-ci-robot

@immuzz this issue is more granular and tracks separate development items compared to the main https://github.com/kubernetes/enhancements/issues/995

it should remain open.

neolit123 avatar Aug 27 '20 17:08 neolit123

@marosset @michmike

immuzz avatar Aug 27 '20 17:08 immuzz

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jan 24 '21 19:01 fejta-bot

/remove-lifecycle stale

neolit123 avatar Jan 24 '21 20:01 neolit123

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 23 '21 17:06 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

k8s-triage-robot avatar Jul 26 '21 18:07 k8s-triage-robot

/remove-lifecycle rotten

neolit123 avatar Jul 26 '21 18:07 neolit123

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 24 '21 18:10 k8s-triage-robot

/remove-lifecycle stale

neolit123 avatar Oct 24 '21 20:10 neolit123

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '22 20:01 k8s-triage-robot

/lifecycle frozen

Windows k8s now supports host processes or priv containers. This will simplify the cni / proxy deployment and we can graduate the kubeadm support to ga.

neolit123 avatar Jan 24 '22 09:01 neolit123

/cc TODO: After windows ut can run regularly, we need a grid board to know the code coverage of windows ut like https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit.

pacoxu avatar Sep 27 '22 03:09 pacoxu