kind
kind copied to clipboard
Configure capacity of the worker nodes
Would be possible to set the capacity of the worker nodes when the cluster is created?
can you elaborate a bit more? what's your use case?
@aojea Doing some scheduler work and would like to consider the CPU and memory capacities of each node. I could use labels for this but was wondering if it is possible to do this when the cluster is setup? Also if labels is the only option, would be possible to tag each node with particular labels from the initialisation script?
well, that seems interesting.@BenTheElder what do you think? Basically the worker nodes are docker containers, so we should be able to use docker resource constrains to limit them https://docs.docker.com/config/containers/resource_constraints/ However, I don't know how this will work with nested cgroups :thinking:
I don't know how this will work with nested cgroups
I might be wrong, but I don't think setting resource upper bounds will impact the current cgroup architecture. I do see performance issues with starving the node of resources, though.
I'm thinking about the UX side of things too; Docker resource constraints are pretty granular. Maybe we only expose some subset of the constraints, or maybe abstract them all together?
Feel free to try this out but IIRC this doesn't work.
Similarly if swap is enabled on the host memory limits won't work on your pods either.
I'm working on decoupling us from docker's command line, when we experiment again with support for ignite and other backends when that is complete, some of those can actually limit things because while they are based around running container images they use VMs :+)
docker resource constraints are working for me with swap, I'll send a PR implementing it I have one node limited to 100M in this example
/assign
docker resource constraints are working for me with swap, I'll send a PR implementing it I have one node limited to 100M in this example
That of course works but ... does it actually limit everything on the node? Have you deployed a pod trying to use more? What does kubelet report?
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
# the control plane node
- role: control-plane
- role: worker
constraints:
memory: "100m"
cpu: "1"
from https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#specify-a-memory-request-and-a-memory-limit
I modify to use directly and try to use 1.5g memory:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: mem-example
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "1500M", "--vm-hang", "1"]
the pod takes more than 4 mins to be created, it doesn't seem to be a hard limit, maybe we should tweak something on cgroups, but checking inside the node it really seems is limiting the memory
asks: 19 total, 1 running, 18 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 2.5 sy, 0.0 ni, 16.7 id, 80.3 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 32147.3 total, 16816.6 free, 1885.6 used, 13445.2 buff/cache
MiB Swap: 2055.0 total, 901.4 free, 1153.6 used. 29866.1 avail Mem
USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
root 20 0 140504 4916 0 S 4.3 0.0 1:16.80 kube-proxy
root 20 0 130236 1720 0 D 3.7 0.0 0:30.99 kindnetd
root 20 0 2214724 70912 60684 S 3.3 0.2 0:37.25 kubelet
root 20 0 1587948 37516 24 D 3.0 0.1 0:36.98 stress
root 20 0 2210024 30812 23940 S 2.7 0.1 0:34.11 containerd
root 20 0 9336 4180 4180 S 1.3 0.0 0:01.93 containerd-shim
root 20 0 10744 4180 4180 S 0.7 0.0 0:01.70 containerd-shim
root 19 -1 22656 6684 6508 S 0.3 0.0 0:01.78 systemd-journal
root 20 0 6024 2756 2648 R 0.3 0.0 0:00.11 top
root 20 0 17524 7688 7688 S 0.0 0.0 0:00.53 systemd
root 20 0 10744 4180 4180 S 0.0 0.0 0:02.67 containerd-shim
root 20 0 1024 0 0 S 0.0 0.0 0:00.00 pause
root 20 0 9336 4180 4180 S 0.0 0.0 0:02.23 containerd-shim
root 20 0 1024 0 0 S 0.0 0.0 0:00.00 pause
root 20 0 10744 4608 4564 S 0.0 0.0 0:00.81 containerd-shim
root 20 0 1024 0 0 S 0.0 0.0 0:00.00 pause
root 20 0 10744 3980 3980 S 0.0 0.0 0:00.91 containerd-shim
root 20 0 744 0 0 S 0.0 0.0 0:00.06 stress
root 20 0 4052 2936 2936 S 0.0 0.0 0:00.05 bash
Looking at the kernel docs it seems that this is throttling https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt , check the block I/o stats
ONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
1698a9d1be92 kind-worker 14.64% 99.42MiB / 100MiB 99.42% 4.34MB / 361kB 1.91GB / 1.04GB 155
1a1a6fb0f69a kind-control-plane 6.75% 1.268GiB / 31.39GiB 4.04% 512kB / 2.03MB 0B / 81.7MB 392
do we want this? or is the idea to fail if it overcommit?
I think that there are several optison:
- use a provider that use VMs for the nodes
- implement something like lxcfs to "fake" the resources and cheat cadvisor and the kubelet
otherwise you can set the limit manually as explained here https://github.com/kubernetes-sigs/kind/issues/1524
using container constraints (cgroups) is only valid for limiting the resources, but kubelet keeps using the whole host memory and cpu resources for its calculations.
using container constraints (cgroups) is only valid for limiting the resources, but kubelet keeps using the whole host memory and cpu resources for its calculations.
Hello @aojea , This PR on cAdvisor adress this point. I hope this will help. Thanks
using container constraints (cgroups) is only valid for limiting the resources, but kubelet keeps using the whole host memory and cpu resources for its calculations.
Hello @aojea , This PR on cAdvisor adress this point. I hope this will help. Thanks
that sounds nice, do you think it has chances to be approved?
using container constraints (cgroups) is only valid for limiting the resources, but kubelet keeps using the whole host memory and cpu resources for its calculations.
Hello @aojea , This PR on cAdvisor adress this point. I hope this will help. Thanks
that sounds nice, do you think it has chances to be approved?
I hope 🤷🏻♂️
Sadly no re: cAdvisor. This doesn't leave us with spectactular options. Maybe we can trick kubelet into reading our own ""vfs"" or something (like lxcfs?) 😬 , semi related: #2318's solution.
Doing some scheduler work and would like to consider the CPU and memory capacities of each node. I could use labels for this...
@palade Did you mean we can limit node's CPU and memory capacities provided to kubernetes cluster by assigning some labels to node? which label you use? Can you give me an example? Thanks a lot.
any progress? Will still be able to do this?
https://github.com/kubernetes/kubernetes/issues/120832