cri-dockerd
cri-dockerd copied to clipboard
BestEffort pods are using swap
What happened?
I already opened a ticket on kube repo which leaded me here.
I was testing the support for swap and I came to an unexpected behavior. In the documentation it is specified that only pods that fall under the Burstable
class can use the host's swap memory. However, I created both a deployment with 1 replica of ubuntu belonging to the Burstable
class, and one belonging to the BestEffort
class, where I ran the command stress --vm 1 --vm-bytes 6G --vm-hang 0
to see the consumption of memory made. The host has 4GB RAM memory and 5GB swap. In both situations, the pod started using swap after exceeding the RAM memory requirement. Wasn't the BestEffort pod supposed to be restarted when it reached the limit of the host's RAM memory? I mention that the kubelet is configured to swapBehavior=LimitedSwap
. I attached two pictures where you can see the normal consumption of host, and consumption after running stress command inside pod
.
What did you expect to happen?
I expected the BestEffort
pod to be killed when it consumes more RAM memory than the host have available.
How can we reproduce it (as minimally and precisely as possible)?
- setup a VM running ubuntu 22.04 with 4GB of RAM memory
- set swap partition to 5GB
- install docker, cri-dockerd and kubernetes packages using the provided versions
- config kubelet with provided config
- install calico cni
- after the cluster is bootstrapped, deploy the following deployment
$ $ cat test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu-deployment
labels:
app: ubuntu
spec:
replicas: 1
selector:
matchLabels:
app: ubuntu
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu:22.04
resources:
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
- this should deploy a BestEffort pod. you can check this by running
kubectl get pod <pod-name> --output=yaml
- exec into the pod and do
apt update & apt install stress
. Then runstress --vm 1 --vm-bytes 6G --vm-hang 0
- check the node where the pod is running with
kubectl get po -o wide
then ssh to that node and runhtop
. Now you should see that the deployed BestEffort pod is consuming swap memory, which according to the Docs, it shouldn't. - if exec into the pod and check
memory.swap.max
, this is set to max. From what I understand, even ifswapBehavior
was set toLimitedSwap
inkubelet
, somehow cri-dockerd may be set the cgroup formemory.swap.max
tomax
.
$ cat /sys/fs/cgroup/memory.swap.max
max
Anything else we need to know?
I am using cgroup v2.
Here is my kubelet config.
$ cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
enableServer: true
evictionPressureTransitionPeriod: 0s
failSwapOn: false
featureGates:
NodeSwap: true
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap:
swapBehavior: LimitedSwap
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
Kubernetes version
$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3
Cloud provider
OS version
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux fs-kube-dev-1 5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Install tools
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.3", GitCommit:"6813625b7cd706db5bc7388921be03071e1a492d", GitTreeState:"clean", BuildDate:"2024-03-15T00:06:16Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"}
Container runtime (CRI) and version (if applicable)
$ cri-dockerd --version
cri-dockerd 0.3.11 (9a8a9fe)
Related plugins (CNI, CSI, ...) and versions (if applicable)
calico: version: 3.27.2