amazon-eks-ami
amazon-eks-ami copied to clipboard
allowed-unsafe-sysctls not working in BootstrapArguments
What happened:
According to the Kubernetes 1.11 documentation
--allowed-unsafe-sysctls 'net.core.rmem_max'
should be a valid kubelet
flag.
We're currently passing this into EKS via BootstrapArguments
:
--kubelet-extra-args "--allowed-unsafe-sysctls 'net.core.rmem_max,net.core.netdev_max_backlog'"
and are using this k8s deployment
apiVersion: apps/v1beta2
kind: Deployment
...
spec:
...
template:
metadata:
...
annotations:
security.alpha.kubernetes.io/unsafe-sysctls: net.core.rmem_max=10485760,net.core.netdev_max_backlog=100000
But we don't see any effect when setting the properties:
sysctl: cannot stat /proc/sys/net/core/rmem_max: No such file or directory
What you expected to happen:
sysctl -w net.core.rmem_max=10485760
on the pod should work
How to reproduce it (as minimally and precisely as possible):
See above.
Anything else we need to know?:
Environment:
- AWS Region: us-east-1
- Instance Type(s): m5.xlarge
- EKS Platform version (use
aws eks describe-cluster --name <name> --query cluster.platformVersion
): eks.1 - Kubernetes version (use
aws eks describe-cluster --name <name> --query cluster.version
): 1.11 - AMI Version: ami-0c24db5df6badc35a, also tried ami-0c5b63ec54dd3fc38
- Kernel (e.g.
uname -a
):Linux ... 4.14.88-88.76.amzn2.x86_64 #1 SMP Mon Jan 7 18:43:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Do we have the same issue as i do, can you confirm? What did bootstrap.sh actually run on the box?
I believe whats going on here is that you cannot use spaces due to how bootstrap.sh is setup.
I put the example from a similar issue in my arg for cloudformation
--kubelet-extra-args '--node-labels=something=hello,somethingelse=bye --register-with-taints=taint1=true'
+ /etc/eks/bootstrap.sh metrictank --kubelet-extra-args ''\''--node-labels=something=hello,somethingelse=bye' '--register-with-taints=taint1=true'\'''
Cluster "kubernetes" set.
Which will only give you:
+ key=--kubelet-extra-args
+ case $key in
+ KUBELET_EXTRA_ARGS=''\''--node-labels=something=hello,somethingelse=bye'
+ shift
+ shift
We're having the same problem when calling bootstrap.sh with, for example:
/etc/eks/bootstrap.sh cluster1 --kubelet-extra-args \
"--allowed-unsafe-sysctls 'net.*' \
--kube-reserved cpu=200m,memory=0.5Gi,ephemeral-storage=1Gi \
--system-reserved cpu=200m,memory=0.2Gi,ephemeral-storage=1Gi"
kubelet doesn't run with KUBELET_EXTRA_ARGS at all in this case. Using image v1.12.10-eks-ee8ff
We're seeing this not working, too:
/etc/eks/bootstrap.sh \
...
--kubelet-extra-args "--allowed-unsafe-sysctls 'net.core.somaxconn'" \
...
The combination of quoting and working around the flapping kublet stuff I describe here got it going: https://github.com/awslabs/amazon-eks-ami/issues/288#issuecomment-541375091
Though I guess on top of the stuff mentioned there we also had to get our pod security policies right to allow this particular sysctl.
Is there a finalized way that anyone was able to get net.core.rmem_max
configured in a pod?
My understanding that the parameter net.core.rmem_max
should only be applicable at host(node) level. Some parameters are not supported when trying to use namespace level sysctl to configure that. (This also has been mentioned in the doc [1] [2] Only namespaced sysctls are configurable via the pod securityContext within Kubernetes
).
According to my testing(I was using amazon-eks-node-1.14-v20200122
, ami-0bf3e2c598f50ba82
, in us-east-1
), to change this parameter, you have to use hostNetwork: true
to let Pod to use host network instead of containerized network namespace. Then, run a privileged container to apply your own kernel configuration on the Node:
- Option 1: Using sysctl to update
net.core.rmem_max
- Option 2: Update through filesystem
echo '10485760' > /proc/sys/net/core/rmem_max
But please note that the configuration also will be applied to other applications/running Pods and it will overwrite the default setting on your worker node. The change parameter is host wide. Make sure doing test before applying it in your production. You may have to land your Pods by Node label to ensure you are running application on worker nodes that with proper kernel parameters.
[1] https://github.com/kubernetes/kubernetes/issues/77546#issuecomment-506885448 [2] https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#setting-sysctls-for-a-pod
Note: The Linux kernel is Linux 4.14.158-129.185.amzn2.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Similar issues
- https://groups.google.com/forum/#!topic/docker-dev/kFZdaIxoxbg
- https://forums.docker.com/t/how-to-tune-kernel-properties-in-docker-images/25291
Just if somebody face same issue.
I was able to allow net.core.somaxconn
changes with following
- Args to
bootstrap.sh
(including some additional)
--kubelet-extra-args \
"--kube-reserved cpu=100m,memory=200Mi,ephemeral-storage=512Mi \
--allowed-unsafe-sysctls=net.core.somaxconn \
--cluster-dns=169.254.20.10"
- Creating PodSecurityPolicy related doc
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.core.somaxconn
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: ${SOME_NAMESPACE}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
namespace: default
Hi All,
I am also facing same issue, I have created PodSecurityPolicy as @maksymivash suggested but still i am not able to set somaxconn through deployment
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.core.somaxconn
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: my-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:my-namespace
and i have the below annotation in my deployment file, but still i am not able to get this working. Can someone help me out
seems to be resolved now.. instead of annotation setting the required syctl in the pod/deployments security context seems to be doing the job
I have the same problem with v1.18.9-eks-d1db3c :( I tried even adding annotations.. Should we add this parameter while creating eks cluster?
PodSecurityPolicy
Per https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#podsecuritypolicy it is deprecated in Kubernetes v1.25. Anyone knows any alternative to this?
Thanks.
Hacky solution
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-sysctls
namespace: kube-system
annotations:
kubernetes.io/description: |
This daemon set configures the kernel parameters required to optimize networking performance.
kubernetes.io/psp: eks.privileged
spec:
selector:
matchLabels:
name: node-sysctls
template:
metadata:
labels:
name: node-sysctls
spec:
volumes:
- name: kube-api-access-5hrk8
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: shell
image: docker.io/alpine:3.13
command:
- nsenter
args:
- '-t'
- '1'
- '-m'
- '-u'
- '-i'
- '-n'
- bash
- '--'
- '-c'
- |
# We are in the host PID namespace, so we can see the host's /proc so we can set sysctls
sysctl -w net.core.rmem_max=2500000
echo "Set sysctls... sleeping"
tail -f /dev/null
resources:
limits:
cpu: 50m
memory: 64Mi
requests:
cpu: 50m
memory: 64Mi
volumeMounts:
- name: kube-api-access-5hrk8
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
hostNetwork: true
hostPID: true
hostIPC: true
terminationGracePeriodSeconds: 1
tolerations:
- operator: Exists