kubespray
kubespray copied to clipboard
Ability to separate control plane components on dedicated nodes
What would you like to be added
Hello, recently I read some article about big scale kubernetes cluster (https://openai.com/research/scaling-kubernetes-to-7500-nodes)
While some folks run API Servers within kube, we’ve always run them outside the cluster itself. Both etcd and API servers run on their own dedicated nodes. Our largest clusters run 5 API servers and 5 etcd nodes to spread the load and minimize impact if one were to ever go down. We’ve had no notable trouble with etcd since splitting out Kubernetes Events into their own etcd cluster back in our last blog post. API Servers are stateless and generally easy to run in a self-healing instance group or scaleset. We haven’t yet tried to build any self-healing automation of etcd clusters because incidents have been extremely rare.
The article describes external kube-apiserver
with dedicated nodes. In my guess they run kube-apiserver
with systemd
service in node kube-apiserver
uses exclusively without sharing it with other workloads.
When we need to support HA enabled kubernetes cluster we can choice external
etcd topology. (https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/#external-etcd-topology)
And I think that kube-apiserver
also can use like above usecase.
When we set x509 certificates for kube-apiserver
properly then kube-apiserver
detached from control plane node can be run normally.
Why is this needed
Support for kube-apiserver
for dedicated node
for enhance and obtain high performance.
It shouldn't be a problem to allocate master nodes from normal operation, e.g. via taint:
Taints: dedicated=master:NoSchedule
@hufhend Yeah, you can disable scheduling the pods to control plane with that taints.
By the way, What I want to describe is seperating the kube-apiserver
from static pods that created with kubeadm then use it as systemd
service in other dedicated machine.
I think it may not be necessary for many use cases..
Well, kubespray build on top of kubeadm, so that would be a separate part to maintain.
Support for
kube-apiserver
fordedicated node
for enhance and obtain high performance.
Do you mean separating control plane components on different nodes ? Aka for instance
[kube_api_server]
control_plane_1
control_plane_2
control_plane_3
[kube_controller_and_scheduler]
control_plane_4
control_plane_5
control_plane_6
I'm not sure if that's possible with kubeadm... :thinking: . Even if it is, it would be a big amount of work on kubespray, I think.
The openAI stuff isn't really relevant. As written in their articles, they have very specific workloads and a quite particular model.
@VannTen
Do you mean separating control plane components on different nodes ? Aka for instance
Yes, I meant like that, additionally [kube_api_server] hosts could be not kubernetes control plane node
just node
for hosting kube-apiserver
with systemd service.
[kube_api_server]
not_control_plane_1
not_control_plane_2
not_control_plane_3
[kube_controller_and_scheduler]
control_plane_1
control_plane_2
control_plane_3
Because kube-apiserver
can be run with binary or container,
and kube-apiserver
just need right x509 certificate for communicating with etcd
and kubelet
and other control plane components to work properly. (Some of the information about kube-apiserver
I know can be wrong)
I think that it would be a tricky job for supporting it on kubespray. And it may not be used frequently in common use cases.
"If" this works really needed then some control plane should be provisioned wtih $ kubeadm init phase control-plane controller-manager
and kubeadm init phase control-plane scheduler
(https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-control-plane)
And missed kube-apiserver
should be run in other nodes as systemd
service or container.
I tested this concept with my local vm clusters and it can be worked, but applying this with kubespray might be tricky as you mentioned. :)
The openAI stuff isn't really relevant. As written in their articles, they have very specific workloads and a quite particular model.
I agree with you they have specific workloads and particular model so it might not be needed in most common cases!
I think this issue can be closed without any works :)
I don't think whether the api-server runs on top of kubelet as a container or directly as a systemd service is particularly impactful (and systemd can run service exactly like containers, in fact) on performance.
Besides, running outside of the cluster would forego some advantages, like monitoring it like any other workloads, etc.
What is interesting though is breaking up the control plane into several groups.
You could instead of closing it modify the title and description for this, if you want :).
I don't think whether the api-server runs on top of kubelet as a container or directly as a systemd service is particularly impactful (and systemd can run service exactly like containers, in fact) on performance.
@VannTen Yeah, It does not affect performance whether that server runs on systemd or container whatever.
But the point what I think isdedicated node
for kube-apiserver
that dedicated node
just have kube-apiserver
and doesn't have any other workloads except kube-apiserver
.
I think this title and description might be confusing about what I want to say :)
For now on, this feature is usually not needed.
Can you suggest which title is suitable for this?
Thank you for your interest :)
But the point what I think is
dedicated node
forkube-apiserver
thatdedicated node
just havekube-apiserver
and doesn't have any other workloads excpetkube-apiserver
.
Yeah I get it. But whether you run with systemd or kubelet is orthogonal to using dedicated nodes.
Can you suggest which title is suitable for this?
What I'm thinking about is "Ability to separate control plane components on dedicated nodes" but it might be a little different from what you intended.
Yeah I get it. But whether you run with systemd or kubelet is orthogonal to using dedicated nodes.
I get it what you intended right now.
As you said, using a dedicated node is one thing, separating and running the API server is another(systemd or container / static pods , whatever)
I will change this issue title based on your opinion :) Thank you!👍
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.