kubespray Ability to separate control plane components on dedicated nodes

What would you like to be added

Hello, recently I read some article about big scale kubernetes cluster (https://openai.com/research/scaling-kubernetes-to-7500-nodes)

While some folks run API Servers within kube, we’ve always run them outside the cluster itself. Both etcd and API servers run on their own dedicated nodes. Our largest clusters run 5 API servers and 5 etcd nodes to spread the load and minimize impact if one were to ever go down. We’ve had no notable trouble with etcd since splitting out Kubernetes Events into their own etcd cluster back in our last blog post. API Servers are stateless and generally easy to run in a self-healing instance group or scaleset. We haven’t yet tried to build any self-healing automation of etcd clusters because incidents have been extremely rare.

The article describes external kube-apiserver with dedicated nodes. In my guess they run kube-apiserver with systemd service in node kube-apiserver uses exclusively without sharing it with other workloads.

When we need to support HA enabled kubernetes cluster we can choice external etcd topology. (https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/#external-etcd-topology)

And I think that kube-apiserver also can use like above usecase.

When we set x509 certificates for kube-apiserver properly then kube-apiserver detached from control plane node can be run normally.

Why is this needed

Support for kube-apiserver for dedicated node for enhance and obtain high performance.

Feb 15 '24 09:02 kimsehwan96

It shouldn't be a problem to allocate master nodes from normal operation, e.g. via taint: Taints: dedicated=master:NoSchedule

Feb 15 '24 22:02 hufhend

@hufhend Yeah, you can disable scheduling the pods to control plane with that taints.

By the way, What I want to describe is seperating the kube-apiserver from static pods that created with kubeadm then use it as systemd service in other dedicated machine.

I think it may not be necessary for many use cases..

Feb 16 '24 02:02 kimsehwan96

Well, kubespray build on top of kubeadm, so that would be a separate part to maintain.

Support for kube-apiserver for dedicated node for enhance and obtain high performance.

Do you mean separating control plane components on different nodes ? Aka for instance

[kube_api_server]
control_plane_1
control_plane_2
control_plane_3
[kube_controller_and_scheduler]
control_plane_4
control_plane_5
control_plane_6

I'm not sure if that's possible with kubeadm... :thinking: . Even if it is, it would be a big amount of work on kubespray, I think.

The openAI stuff isn't really relevant. As written in their articles, they have very specific workloads and a quite particular model.

Feb 16 '24 08:02 VannTen

@VannTen

Do you mean separating control plane components on different nodes ? Aka for instance

Yes, I meant like that, additionally [kube_api_server] hosts could be not kubernetes control plane node just node for hosting kube-apiserver with systemd service.

[kube_api_server]
not_control_plane_1
not_control_plane_2
not_control_plane_3

[kube_controller_and_scheduler]
control_plane_1
control_plane_2
control_plane_3

Because kube-apiserver can be run with binary or container, and kube-apiserver just need right x509 certificate for communicating with etcd and kubelet and other control plane components to work properly. (Some of the information about kube-apiserver I know can be wrong)

I think that it would be a tricky job for supporting it on kubespray. And it may not be used frequently in common use cases.

"If" this works really needed then some control plane should be provisioned wtih $ kubeadm init phase control-plane controller-manager and kubeadm init phase control-plane scheduler (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-control-plane)

And missed kube-apiserver should be run in other nodes as systemd service or container.

I tested this concept with my local vm clusters and it can be worked, but applying this with kubespray might be tricky as you mentioned. :)

The openAI stuff isn't really relevant. As written in their articles, they have very specific workloads and a quite particular model.

I agree with you they have specific workloads and particular model so it might not be needed in most common cases!

I think this issue can be closed without any works :)

Feb 16 '24 09:02 kimsehwan96

I don't think whether the api-server runs on top of kubelet as a container or directly as a systemd service is particularly impactful (and systemd can run service exactly like containers, in fact) on performance.

Besides, running outside of the cluster would forego some advantages, like monitoring it like any other workloads, etc.

What is interesting though is breaking up the control plane into several groups.

You could instead of closing it modify the title and description for this, if you want :).

Feb 16 '24 10:02 VannTen

I don't think whether the api-server runs on top of kubelet as a container or directly as a systemd service is particularly impactful (and systemd can run service exactly like containers, in fact) on performance.

@VannTen Yeah, It does not affect performance whether that server runs on systemd or container whatever.

But the point what I think isdedicated node for kube-apiserver that dedicated node just have kube-apiserver and doesn't have any other workloads except kube-apiserver.

I think this title and description might be confusing about what I want to say :)

For now on, this feature is usually not needed.

Can you suggest which title is suitable for this?

Thank you for your interest :)

Feb 16 '24 10:02 kimsehwan96

But the point what I think isdedicated node for kube-apiserver that dedicated node just have kube-apiserver and doesn't have any other workloads excpet kube-apiserver.

Yeah I get it. But whether you run with systemd or kubelet is orthogonal to using dedicated nodes.

Can you suggest which title is suitable for this?

What I'm thinking about is "Ability to separate control plane components on dedicated nodes" but it might be a little different from what you intended.

Feb 16 '24 10:02 VannTen

Yeah I get it. But whether you run with systemd or kubelet is orthogonal to using dedicated nodes.

I get it what you intended right now.

As you said, using a dedicated node is one thing, separating and running the API server is another(systemd or container / static pods , whatever)

I will change this issue title based on your opinion :) Thank you!👍

Feb 16 '24 10:02 kimsehwan96

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 16 '24 17:05 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 15 '24 17:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jul 15 '24 18:07 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jul 15 '24 18:07 k8s-ci-robot

kubespray kubespray copied to clipboard

Ability to separate control plane components on dedicated nodes

What would you like to be added

Why is this needed

kubespray
kubespray copied to clipboard