k0s
k0s copied to clipboard
CPUManager Support (allocate exclusive CPUs to containers)
Is your feature request related to a problem? Please describe.
Currently trying to bring a single node with CPUManager static policy with the following k0s.yaml:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
creationTimestamp: null
name: k0s
spec:
api:
address: 192.168.100.63
k0sApiPort: 9443
port: 6443
sans:
- 192.168.100.63
- fe80::be24:11ff:fe0c:7c87
controllerManager: {}
extensions:
helm:
charts: null
concurrencyLevel: 5
repositories: null
storage:
create_default_storage_class: false
type: external_storage
installConfig:
users:
etcdUser: etcd
kineUser: kube-apiserver
konnectivityUser: konnectivity-server
kubeAPIserverUser: kube-apiserver
kubeSchedulerUser: kube-scheduler
konnectivity:
adminPort: 8133
agentPort: 8132
network:
calico: null
clusterDomain: cluster.local
dualStack: {}
kubeProxy:
iptables:
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
minSyncPeriod: 0s
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
metricsBindAddress: 0.0.0.0:10249
mode: iptables
kuberouter:
autoMTU: true
hairpin: Enabled
ipMasq: false
metricsPort: 8080
mtu: 0
peerRouterASNs: ""
peerRouterIPs: ""
nodeLocalLoadBalancing:
envoyProxy:
apiServerBindPort: 7443
konnectivityServerBindPort: 7132
type: EnvoyProxy
podCIDR: 10.244.0.0/16
provider: kuberouter
serviceCIDR: 10.96.0.0/12
scheduler: {}
storage:
etcd:
externalCluster: null
peerAddress: 192.168.100.63
type: etcd
telemetry:
enabled: true
workerProfiles:
- name: custom-cpu
values:
cpuManagerPolicy: static
reservedSystemCPUs: "0-5"
I used this command for the installation: k0s install controller --profile custom-cpu --single -c /etc/k0s/k0s.yaml
But k0s is adding a conflicting parameter that won't allow the CPUManager policy to be applied, below is the eventual kubelet-config.yaml:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous: {}
webhook:
cacheTTL: 0s
x509:
clientCAFile: /var/lib/k0s/pki/ca.crt
authorization:
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/k0s/containerd.sock
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 0s
eventRecordQPS: 0
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReservedCgroup: system.slice
kubeletCgroups: /system.slice/containerd.service
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
registerWithTaints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
reservedSystemCPUs: 0-5
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
serverTLSBootstrap: true
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
tlsMinVersion: VersionTLS12
volumePluginDir: /usr/libexec/k0s/kubelet-plugins/volume/exec
It is adding kubeReservedCgroup and kubeletCgroups and this seems to be hard-coded in the code:
grep -r -i kubeReservedCgroup
pkg/component/worker/kubelet.go: KubeReservedCgroup string
pkg/component/worker/kubelet.go: KubeReservedCgroup: "system.slice",
pkg/component/worker/kubelet.go: preparedConfig.KubeReservedCgroup = kubeletConfigData.KubeReservedCgroup
grep -r -i kubeletCgroups
pkg/component/worker/kubelet.go: KubeletCgroups string
pkg/component/worker/kubelet.go: KubeletCgroups: "/system.slice/containerd.service",
pkg/component/worker/kubelet.go: preparedConfig.KubeletCgroups = kubeletConfigData.KubeletCgroups
With this the kubelet daemon can not start up, gives the following error:
run.go:74] "command failed" err="failed to validate kubelet configuration, error: invalid configuration: can't use reservedSystemCPUs (--reserved-cpus) with systemReservedCgroup
Describe the solution you would like
Support cpuManagerPolicy and reservedSystemCPUs in the kubelet configuration
Describe alternatives you've considered
No response
Additional context
No response
This relates heavily on the same findings as in #4255 . Essentially we need to figure out better way to "default" cgroup settings without hardcoding anything like we do in some places currently
Thanks @jnummelin - Should I keep this issue open or close it in favor of #4255 and use that to track this feature also?
I think leaving this open is fair, as this is a real blocker, i.e. there's no way to use CPUManager with k0s right now, unfortunately.
(And I consider this a bug, since nobody expected CPUManager not to work with k0s.)
@twz123 - Noted, I will leave it open
I think leaving this open is fair, as this is a real blocker, i.e. there's no way to use CPUManager with k0s right now, unfortunately.
(And I consider this a bug, since nobody expected CPUManager not to work with k0s.)
To be clear, CPUManager can be used with k0s just not with reservedSystemCPUs. I installed k0s with arg --kubelet-extra-args='--cpu-manager-policy=static' and kubelet is running without error and I can see log entries for cpu_manager
Ah, good to know. Still, the hard coded cgroup related settings in k0s are something that needs to be addressed somehow.
In my case (k0sctl version: v0.17.5), --kubelet-extra-args='--cpu-manager-policy=static' was not enough, I had also set the resources reservation parameters:
installFlags:
- --debug
- --disable-components=konnectivity-server,metrics-server
- --kubelet-extra-args='--cpu-manager-policy=static --kube-reserved=cpu=500m,memory=1Gi --kube-reserved-cgroup=system.slice --kubelet-cgroups=/system.slice/containerd.service'
I had also set the resources reservation parameters:
Correct! I'm also specifying those (I should have mentioned that my previous comment).
The issue is marked as stale since no activity has been recorded in 30 days
If you look at the issue #4234 I found a hack to allow override kubelet parameters. The default kubelet-config.yaml overrides some parameters even if you try to pass them directly as extra args. However you can build your own kubelet-config.yaml file and pass it to kubelet with --kubelet-extra-args=--config=/var/lib/k0s/kubelet-ext-config.yaml.
But you'll then be facing another problem I have not yet solved. Putting limits to a cgroup and running k0s inside this cgroup works but kubelet will still use the system limits and the eviction mechanism does not work as expected.
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days