amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

allowed-unsafe-sysctls not working in BootstrapArguments

Open bkempe opened this issue 6 years ago • 12 comments

What happened:

According to the Kubernetes 1.11 documentation

--allowed-unsafe-sysctls 'net.core.rmem_max'

should be a valid kubelet flag.

We're currently passing this into EKS via BootstrapArguments: --kubelet-extra-args "--allowed-unsafe-sysctls 'net.core.rmem_max,net.core.netdev_max_backlog'" and are using this k8s deployment

apiVersion: apps/v1beta2
kind: Deployment
...
spec:
...
  template:
    metadata:
...
      annotations:
        security.alpha.kubernetes.io/unsafe-sysctls: net.core.rmem_max=10485760,net.core.netdev_max_backlog=100000

But we don't see any effect when setting the properties:

sysctl: cannot stat /proc/sys/net/core/rmem_max: No such file or directory

What you expected to happen:

sysctl -w net.core.rmem_max=10485760 on the pod should work

How to reproduce it (as minimally and precisely as possible):

See above.

Anything else we need to know?:

Environment:

  • AWS Region: us-east-1
  • Instance Type(s): m5.xlarge
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.1
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.11
  • AMI Version: ami-0c24db5df6badc35a, also tried ami-0c5b63ec54dd3fc38
  • Kernel (e.g. uname -a): Linux ... 4.14.88-88.76.amzn2.x86_64 #1 SMP Mon Jan 7 18:43:26 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

bkempe avatar Feb 14 '19 20:02 bkempe

Do we have the same issue as i do, can you confirm? What did bootstrap.sh actually run on the box?

I believe whats going on here is that you cannot use spaces due to how bootstrap.sh is setup.

I put the example from a similar issue in my arg for cloudformation

--kubelet-extra-args '--node-labels=something=hello,somethingelse=bye --register-with-taints=taint1=true'
+ /etc/eks/bootstrap.sh metrictank --kubelet-extra-args ''\''--node-labels=something=hello,somethingelse=bye' '--register-with-taints=taint1=true'\'''
Cluster "kubernetes" set.

Which will only give you:

+ key=--kubelet-extra-args
+ case $key in
+ KUBELET_EXTRA_ARGS=''\''--node-labels=something=hello,somethingelse=bye'
+ shift
+ shift

tehlers320 avatar Feb 20 '19 02:02 tehlers320

We're having the same problem when calling bootstrap.sh with, for example:

/etc/eks/bootstrap.sh cluster1 --kubelet-extra-args \
  "--allowed-unsafe-sysctls 'net.*' \
  --kube-reserved cpu=200m,memory=0.5Gi,ephemeral-storage=1Gi \
  --system-reserved cpu=200m,memory=0.2Gi,ephemeral-storage=1Gi"

kubelet doesn't run with KUBELET_EXTRA_ARGS at all in this case. Using image v1.12.10-eks-ee8ff

thorro avatar Oct 01 '19 06:10 thorro

We're seeing this not working, too:

/etc/eks/bootstrap.sh \
...
  --kubelet-extra-args "--allowed-unsafe-sysctls 'net.core.somaxconn'" \
...

kevincantu avatar Oct 12 '19 19:10 kevincantu

The combination of quoting and working around the flapping kublet stuff I describe here got it going: https://github.com/awslabs/amazon-eks-ami/issues/288#issuecomment-541375091

Though I guess on top of the stuff mentioned there we also had to get our pod security policies right to allow this particular sysctl.

kevincantu avatar Oct 25 '19 00:10 kevincantu

Is there a finalized way that anyone was able to get net.core.rmem_max configured in a pod?

ghost avatar Feb 03 '20 00:02 ghost

My understanding that the parameter net.core.rmem_max should only be applicable at host(node) level. Some parameters are not supported when trying to use namespace level sysctl to configure that. (This also has been mentioned in the doc [1] [2] Only namespaced sysctls are configurable via the pod securityContext within Kubernetes).

According to my testing(I was using amazon-eks-node-1.14-v20200122, ami-0bf3e2c598f50ba82, in us-east-1), to change this parameter, you have to use hostNetwork: true to let Pod to use host network instead of containerized network namespace. Then, run a privileged container to apply your own kernel configuration on the Node:

  • Option 1: Using sysctl to update net.core.rmem_max
  • Option 2: Update through filesystem echo '10485760' > /proc/sys/net/core/rmem_max

But please note that the configuration also will be applied to other applications/running Pods and it will overwrite the default setting on your worker node. The change parameter is host wide. Make sure doing test before applying it in your production. You may have to land your Pods by Node label to ensure you are running application on worker nodes that with proper kernel parameters.

[1] https://github.com/kubernetes/kubernetes/issues/77546#issuecomment-506885448 [2] https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#setting-sysctls-for-a-pod

Note: The Linux kernel is Linux 4.14.158-129.185.amzn2.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

Similar issues

  • https://groups.google.com/forum/#!topic/docker-dev/kFZdaIxoxbg
  • https://forums.docker.com/t/how-to-tune-kernel-properties-in-docker-images/25291

0xlen avatar Mar 03 '20 14:03 0xlen

Just if somebody face same issue. I was able to allow net.core.somaxconn changes with following

  1. Args to bootstrap.sh (including some additional)
    --kubelet-extra-args \
        "--kube-reserved cpu=100m,memory=200Mi,ephemeral-storage=512Mi \
        --allowed-unsafe-sysctls=net.core.somaxconn \
        --cluster-dns=169.254.20.10"
  1. Creating PodSecurityPolicy related doc
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: sysctl
spec:
  allowPrivilegeEscalation: false
  allowedUnsafeSysctls:
  - net.core.somaxconn
  defaultAllowPrivilegeEscalation: false
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: allow-sysctl
rules:
- apiGroups:
  - policy
  resourceNames:
  - sysctl
  resources:
  - podsecuritypolicies
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-sysctl
  namespace: ${SOME_NAMESPACE}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
  namespace: default

maksymivash avatar Sep 03 '20 10:09 maksymivash

Hi All,

I am also facing same issue, I have created PodSecurityPolicy as @maksymivash suggested but still i am not able to set somaxconn through deployment

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: sysctl
spec:
  allowPrivilegeEscalation: false
  allowedUnsafeSysctls:
  - net.core.somaxconn
  defaultAllowPrivilegeEscalation: false
  fsGroup:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: allow-sysctl
rules:
- apiGroups:
  - policy
  resourceNames:
  - sysctl
  resources:
  - podsecuritypolicies
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: allow-sysctl
  namespace: my-namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts:my-namespace

and i have the below annotation in my deployment file, but still i am not able to get this working. Can someone help me out

image

thisis2394 avatar Oct 14 '20 10:10 thisis2394

seems to be resolved now.. instead of annotation setting the required syctl in the pod/deployments security context seems to be doing the job

image

thisis2394 avatar Oct 15 '20 07:10 thisis2394

Screen Shot 2021-02-08 at 20 27 41 Screen Shot 2021-02-08 at 20 28 17

I have the same problem with v1.18.9-eks-d1db3c :( I tried even adding annotations.. Should we add this parameter while creating eks cluster?

pekermert avatar Feb 08 '21 17:02 pekermert

PodSecurityPolicy

Per https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/#podsecuritypolicy it is deprecated in Kubernetes v1.25. Anyone knows any alternative to this?

Thanks.

tigerinus avatar May 18 '22 15:05 tigerinus

Hacky solution

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-sysctls
  namespace: kube-system
  annotations:
    kubernetes.io/description: |
      This daemon set configures the kernel parameters required to optimize networking performance.
    kubernetes.io/psp: eks.privileged
spec:
  selector:
    matchLabels:
      name: node-sysctls
  template:
    metadata:
      labels:
        name: node-sysctls
    spec:
      volumes:
        - name: kube-api-access-5hrk8
          projected:
            sources:
              - serviceAccountToken:
                  expirationSeconds: 3607
                  path: token
              - configMap:
                  name: kube-root-ca.crt
                  items:
                    - key: ca.crt
                      path: ca.crt
              - downwardAPI:
                  items:
                    - path: namespace
                      fieldRef:
                        apiVersion: v1
                        fieldPath: metadata.namespace
            defaultMode: 420
      containers:
        - name: shell
          image: docker.io/alpine:3.13
          command:
            - nsenter
          args:
            - '-t'
            - '1'
            - '-m'
            - '-u'
            - '-i'
            - '-n'
            - bash
            - '--'
            - '-c'
            - |
              # We are in the host PID namespace, so we can see the host's /proc so we can set sysctls
              sysctl -w net.core.rmem_max=2500000

              echo "Set sysctls... sleeping"
              tail -f /dev/null
          resources:
            limits:
              cpu: 50m
              memory: 64Mi
            requests:
              cpu: 50m
              memory: 64Mi
          volumeMounts:
            - name: kube-api-access-5hrk8
              readOnly: true
              mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
      hostNetwork: true
      hostPID: true
      hostIPC: true
      terminationGracePeriodSeconds: 1
      tolerations:
        - operator: Exists

TroyKomodo avatar Oct 04 '22 20:10 TroyKomodo