synology-csi icon indicating copy to clipboard operation
synology-csi copied to clipboard

Support for securityContext for pods

Open lvikstro opened this issue 2 years ago • 42 comments

Installing the prometheus operator helm chart with defaults (https://prometheus-community.github.io/helm-charts, kube-prometheus-stack) is by default setting this for the prometheus instance:

securityContext: runAsGroup: 2000 runAsNonRoot: true runAsUser: 1000 fsGroup: 2000

This makes the "prometheus-kube-prometheus-stack-prometheus-0" pod go into a crash-loop with the error in logs: "nable to create mmap-ed active query log"

Changing the prometheusSpec securityContext like this: securityContext: runAsGroup: 0 runAsNonRoot: true runAsUser: 0 fsGroup: 2000 makes it all work. But most likely running with root permissions then on the file system.

This seems to be an issue with the csi implementation where it doesn't support fsGroupSupport or similar. For example longhorn does this with "fsGroupPolicy: ReadWriteOnceWithFSType" which make each volume being examined at mount time to determine if permissions should be recursively applied.

lvikstro avatar Mar 30 '22 05:03 lvikstro

@lvikstro I get the following error even when I speficy runAsUser 0. Are you sure it works? 🤔

securityContext:
  runAsGroup: 0
  runAsNonRoot: true
  runAsUser: 0
  fsGroup: 2000
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  3m52s                 default-scheduler  Successfully assigned monitoring/prometheus-prometheus-kube-prometheus-prometheus-0 to node
  Warning  Failed     75s (x12 over 3m30s)  kubelet            Error: container's runAsUser breaks non-root policy (pod: "prometheus-prometheus-kube-prometheus-prometheus-0_monitoring(9e509cb5-e8af-4ac4-8ce0-c96fb6ca19c5)", container: init-config-reloader)
  Normal   Pulled     60s (x13 over 3m30s)  kubelet            Container image "quay.io/prometheus-operator/prometheus-config-reloader:v0.56.2" already present on machine

inductor avatar May 27 '22 23:05 inductor

Hi there @lvikstro, I ran into the exact same problem (and I mean down to the error message same). I am using a Kubernetes 1.21 with the newest release of the driver.

The security context does work, since this is not the responsibility of the driver, but Kubernetes, but in order for it to work, I had to configure some things:

  1. This apparently does not work with btrfs
  2. fsType in the storage class is deprecated and was replaced with csi.storage.k8s.io/fstype: ext4
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: normal
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.san.synology.com
# if all params are empty, synology CSI will choose an available location to create volume
parameters:
  dsm: "<dsmip>"
  location: /volume<n>
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true

This combination fixed it for me

schoeppi5 avatar Jun 01 '22 09:06 schoeppi5

@schoeppi5

Hello.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: synology-iscsi-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.san.synology.com
parameters:
  dsm: '192.168.16.240'
  location: '/volume1'
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true

This is my SC spec but still doesn't work for me .-. anything in your mind?

inductor avatar Jun 09 '22 03:06 inductor

I am also facing this issue. I am using latest version of this repository and postgresql is not able to start because owner of nfs is not set properly (although I have setup securityContext and SC properly)

statefulset definition

apiVersion: apps/v1
kind: StatefulSet
metadata:
  creationTimestamp: "2022-06-17T14:55:44Z"
  generation: 18
  labels:
    app.kubernetes.io/component: primary
    app.kubernetes.io/instance: vaultarden
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: postgresql
    helm.sh/chart: postgresql-10.16.2
  name: vaultarden-postgresql
  namespace: vaultwarden
  resourceVersion: "7462911"
  uid: 239fd77b-e13b-4303-b007-431424ce526e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: vaultarden
      app.kubernetes.io/name: postgresql
      role: primary
  serviceName: vaultarden-postgresql-headless
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: primary
        app.kubernetes.io/instance: vaultarden
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: postgresql
        helm.sh/chart: postgresql-10.16.2
        role: primary
      name: vaultarden-postgresql
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: primary
                  app.kubernetes.io/instance: vaultarden
                  app.kubernetes.io/name: postgresql
              namespaces:
              - vaultwarden
              topologyKey: kubernetes.io/hostname
            weight: 1
      automountServiceAccountToken: false
      containers:
      - env:
        - name: BITNAMI_DEBUG
          value: "true"
        - name: POSTGRESQL_PORT_NUMBER
          value: "5432"
        - name: POSTGRESQL_VOLUME_DIR
          value: /bitnami/postgresql
        - name: PGDATA
          value: /bitnami/postgresql/data
        - name: POSTGRES_POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: postgresql-postgres-password
              name: vaultarden-postgresql
        - name: POSTGRES_USER
          value: vaultwarden
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              key: postgresql-password
              name: vaultarden-postgresql
        - name: POSTGRES_DB
          value: vaultwarden
        - name: POSTGRESQL_ENABLE_LDAP
          value: "no"
        - name: POSTGRESQL_ENABLE_TLS
          value: "no"
        - name: POSTGRESQL_LOG_HOSTNAME
          value: "false"
        - name: POSTGRESQL_LOG_CONNECTIONS
          value: "false"
        - name: POSTGRESQL_LOG_DISCONNECTIONS
          value: "false"
        - name: POSTGRESQL_PGAUDIT_LOG_CATALOG
          value: "off"
        - name: POSTGRESQL_CLIENT_MIN_MESSAGES
          value: error
        - name: POSTGRESQL_SHARED_PRELOAD_LIBRARIES
          value: pgaudit
        image: docker.io/bitnami/postgresql:11.14.0-debian-10-r28
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - exec pg_isready -U "vaultwarden" -d "dbname=vaultwarden" -h 127.0.0.1
              -p 5432
          failureThreshold: 6
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: vaultarden-postgresql
        ports:
        - containerPort: 5432
          name: tcp-postgresql
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - -e
            - |
              exec pg_isready -U "vaultwarden" -d "dbname=vaultwarden" -h 127.0.0.1 -p 5432
              [ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ]
          failureThreshold: 6
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 250m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
        - mountPath: /bitnami/postgresql
          name: postgresql
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      terminationGracePeriodSeconds: 30
      volumes:
      - name: postgresql
        persistentVolumeClaim:
          claimName: test
      - emptyDir:
          medium: Memory
        name: dshm
      - emptyDir: {}
        name: data
  updateStrategy:
    type: RollingUpdate

Storage class definition

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2022-06-21T07:53:58Z"
  name: synology-smb-storage
  resourceVersion: "7438914"
  uid: 19290577-5044-427d-a4a6-5532b83c49bb
parameters:
  csi.storage.k8s.io/node-stage-secret-name: cifs-csi-credentials
  csi.storage.k8s.io/node-stage-secret-namespace: synology-csi
  dsm: 192.168.30.13
  fsType: ext4
  location: /volume1
  protocol: smb
provisioner: csi.san.synology.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

These are logs from postgresql when using this configuration

postgresql 11:04:36.12
postgresql 11:04:36.12 Welcome to the Bitnami postgresql container
postgresql 11:04:36.12 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql
postgresql 11:04:36.12 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues
postgresql 11:04:36.13
postgresql 11:04:36.13 DEBUG ==> Configuring libnss_wrapper...
postgresql 11:04:36.14 INFO  ==> ** Starting PostgreSQL setup **
postgresql 11:04:36.18 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 11:04:36.18 INFO  ==> Loading custom pre-init scripts...
postgresql 11:04:36.19 INFO  ==> Initializing PostgreSQL database...
postgresql 11:04:36.19 DEBUG ==> Ensuring expected directories/files exist...
mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied

Anyway, if I change directory to be emptydir (instead of pvc from synology nfs) it works and I can verify that owner is 1001 (which I have set in securityContext)

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /bitnami/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok

Success. You can now start the database server using:

    /opt/bitnami/postgresql/bin/pg_ctl -D /bitnami/postgresql/data -l logfile start

postgresql 11:09:21.51 INFO  ==> Starting PostgreSQL in background...
waiting for server to start....2022-06-21 11:09:21.540 GMT [66] LOG:  listening on IPv6 address "::1", port 5432
2022-06-21 11:09:21.540 GMT [66] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-06-21 11:09:21.543 GMT [66] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-06-21 11:09:21.555 GMT [67] LOG:  database system was shut down at 2022-06-21 11:09:21 GMT
2022-06-21 11:09:21.558 GMT [66] LOG:  database system is ready to accept connections
 done
server started
CREATE DATABASE
postgresql 11:09:21.97 INFO  ==> Changing password of postgres
ALTER ROLE
postgresql 11:09:22.01 INFO  ==> Creating user vaultwarden
CREATE ROLE
postgresql 11:09:22.03 INFO  ==> Granting access to "vaultwarden" to the database "vaultwarden"
GRANT
ALTER DATABASE
postgresql 11:09:22.06 INFO  ==> Setting ownership for the 'public' schema database "vaultwarden" to "vaultwarden"
ALTER SCHEMA
postgresql 11:09:22.10 INFO  ==> Configuring replication parameters
postgresql 11:09:22.14 INFO  ==> Configuring synchronous_replication
postgresql 11:09:22.14 INFO  ==> Configuring fsync
postgresql 11:09:22.18 INFO  ==> Loading custom scripts...
postgresql 11:09:22.19 INFO  ==> Enabling remote connections
postgresql 11:09:22.20 INFO  ==> Stopping PostgreSQL...
waiting for server to shut down....2022-06-21 11:09:22.214 GMT [66] LOG:  received fast shutdown request
2022-06-21 11:09:22.216 GMT [66] LOG:  aborting any active transactions
2022-06-21 11:09:22.220 GMT [66] LOG:  background worker "logical replication launcher" (PID 73) exited with exit code 1
2022-06-21 11:09:22.221 GMT [68] LOG:  shutting down
2022-06-21 11:09:22.239 GMT [66] LOG:  database system is shut down
 done
server stopped
postgresql 11:09:22.32 INFO  ==> ** PostgreSQL setup finished! **

postgresql 11:09:22.37 INFO  ==> ** Starting PostgreSQL **
2022-06-21 11:09:22.393 GMT [1] LOG:  pgaudit extension initialized
2022-06-21 11:09:22.395 GMT [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-06-21 11:09:22.395 GMT [1] LOG:  listening on IPv6 address "::", port 5432
2022-06-21 11:09:22.398 GMT [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-06-21 11:09:22.412 GMT [156] LOG:  database system was shut down at 2022-06-21 11:09:22 GMT
2022-06-21 11:09:22.415 GMT [1] LOG:  database system is ready to accept connections

Owner of the directory

$ ls -l /bitnami/
total 8
drwxrwsrwx. 3 root 1001 4096 Jun 21 11:09 postgresql

$ ls -l /bitnami/postgresql
total 8
drwx------. 19 1001 1001 4096 Jun 21 11:09 data

$ ls -l /bitnami/postgresql/data
total 176
drwx------. 6 1001 root 4096 Jun 21 11:09 base
drwx------. 2 1001 root 4096 Jun 21 11:10 global
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_commit_ts
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_dynshmem
-rw-------. 1 1001 root 1636 Jun 21 11:09 pg_ident.conf
drwx------. 4 1001 root 4096 Jun 21 11:09 pg_logical
drwx------. 4 1001 root 4096 Jun 21 11:09 pg_multixact
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_notify
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_replslot
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_serial
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_snapshots
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_stat
drwx------. 2 1001 root 4096 Jun 21 11:10 pg_stat_tmp
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_subtrans
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_tblspc
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_twophase
-rw-------. 1 1001 root    3 Jun 21 11:09 PG_VERSION
drwx------. 3 1001 root 4096 Jun 21 11:09 pg_wal
drwx------. 2 1001 root 4096 Jun 21 11:09 pg_xact
-rw-------. 1 1001 root   88 Jun 21 11:09 postgresql.auto.conf
-rw-------. 1 1001 root  249 Jun 21 11:09 postmaster.opts
-rw-------. 1 1001 root   79 Jun 21 11:09 postmaster.pid

jjdiazgarcia avatar Jun 21 '22 11:06 jjdiazgarcia

Depending on the K8s version you are using, there is a problem with the DelegateFSGroupToCSIDriver feature gate. This is enabled by default in starting with K8s 1.23.

Normally, the kubelet is responsible for "fulfilling" the securityContext chown and chmod requirements. This feature gate enables the kubelet to delegate this to the csi driver, if it supports it.

The synology csi driver declares that it is able to do that, but just isn't doing it.

The quick workaround for this, is to disable this feature gate and always let the kubelet do that.

The solution for this would be for the csi driver either to not declare this capability, or for it to actually work.

schoeppi5 avatar Aug 17 '22 08:08 schoeppi5

I am also running into this issue with OpenShift 4.11 (based on k8s 1.24). CSI driver provisions and mounts the volume with no issues, but pod instantiation always fails with Permission Denied.

rblundon avatar Sep 06 '22 01:09 rblundon

nfs

kubelet isn't doing the chown for NFS volumes. You'll have to use an init container for that

schoeppi5 avatar Sep 06 '22 05:09 schoeppi5

@rblundon Do you have a bit more info? Maybe the storageclass you're using and the podspec

schoeppi5 avatar Sep 06 '22 05:09 schoeppi5

Disabling DelegateFSGroupToCSIDriver worked perfectly for me btw. Thank you so much!

inductor avatar Sep 06 '22 15:09 inductor

Disabling DelegateFSGroupToCSIDriver worked perfectly for me btw. Thank you so much!

How to disable DelegateFSGroupToCSIDriver on a existing cluster?

Ryanznoco avatar Sep 06 '22 16:09 Ryanznoco

https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

schoeppi5 avatar Sep 06 '22 16:09 schoeppi5

@Ryanznoco Hi, it depends on what Kubernetes distribution you use but you need to look up "feature gate"

inductor avatar Sep 06 '22 23:09 inductor

https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

It is not working for me. I added "--feature-gates=DelegateFSGroupToCSIDriver=false" into the api-server manifest, after re-deploy completely, my redis pod still occurs "Permission denied" error. My kubernetes version is 1.23.10.

Ryanznoco avatar Sep 07 '22 15:09 Ryanznoco

You need to add the Feature Gate to every kubelet config for your cluster, since this is a kubelet gate, not an api server one. Depending on your installation method, there might be an easier way of changing the kubelet config

schoeppi5 avatar Sep 07 '22 16:09 schoeppi5

Sorry I don't know how to do this. Can I delete the line csi.NodeServiceCapability_RPC_VOLUME_MOUNT_GROUP, in source code and recompile and install csi dirver?

Ryanznoco avatar Sep 07 '22 16:09 Ryanznoco

That would be, like, a lot more work than disabling the feature gate.

Please read the whole message, before proceeding

There are a few ways of doing that, manually should work most of the time, other approaches depend on your installation.

Manually

If you have K8s installed locally, on VMs, or any other way, where your installation isn't managed by a cloud provider, you can:

  1. Go to one of your nodes
  2. Find the kubelet process (pid) ps -ef | grep /usr/bin/kubelet
  3. Find the command line cat /proc/<pid>/cmdline
  4. Find the path of the kubelet config. That is the path after --config= (That is probably going to be /var/lib/kubelet/config.yaml)
  5. In that file, you can add:
featureGates:
  DelegateFSGroupToCSIDriver: false
  1. Restart the kubelet service (systemctl restart kubelet)
  2. Congrats, you just disabled a feature gate in the kubelet. Now repeat 5. & .6 on all other worker nodes.

kubeadm

kubeadm keeps the current kubelet config in a configmap on the cluster in the kube-system namespace, called kubelet-config.

You can find the whole process well documented here. It works pretty much the same as the manual doing.

talos linux

Just putting that one in here, since I am working with that right now:

In your worker configuration, add the feature gates, you want to en/disable like depicted in their docs here.

Others

For other installation methods, you'll have to consult their respective docs on how to disable kubelet feature gates.

But yes, you could probably also remove this capability, recompile, rebuild the image and patch your deployment, but no guaranty on that.

At this point, I might as well open a PR for this issue. We'll see.

Hope this helps you and you get the driver running correctly. Let me now how it went, I'm invested now 😆

schoeppi5 avatar Sep 07 '22 17:09 schoeppi5

This makes sense, but how would it be accomplished on OpenShift?

rblundon avatar Sep 08 '22 00:09 rblundon

Sorry, I don't have any experience in OpenShift

schoeppi5 avatar Sep 08 '22 07:09 schoeppi5

@rblundon for OpenShift problems you really SHOULD ask Red Hat

inductor avatar Sep 08 '22 14:09 inductor

@schoeppi5 Thank you for your help. I solved it by modifying the source code. I also tried modifying the kubeadm configmap, but it still doesn't work. Looking forward to the new release with your PR. @inductor And thank you too.

Ryanznoco avatar Sep 09 '22 15:09 Ryanznoco

@Ryanznoco - I will try reinstalling the CSI driver from your REPO. @inductor - I work for RH, but in sales, not engineering. Pretty sure this wouldn't get any traction as it looks to be na issue with the Synology Driver not doing what it is advertising t=it is capable of doing.

rblundon avatar Sep 09 '22 16:09 rblundon

The problem here is that this CSI just declare the capability but it does not do anything to actually implement that like for other CSI. See here an example https://github.com/kubernetes-csi/csi-driver-smb/pull/379/files

We should point into the README about this limitation because more and more k8s clusters and bistro implement nowadays Security Context and those days were everything was running as a root likely and hopefully are gone.

mazzy89 avatar Nov 26 '22 14:11 mazzy89

DelegateFSGroupToCSIDriver is enabled by default as of 1.26 since the feature is considered as GA.

inductor avatar Jan 04 '23 15:01 inductor

I also just upgraded to v1.26.1 and kubelet complained wouldn't allow me to disable the feature gate anymore. This is a pretty major bug which means the CSI driver doesn't work on k8s v1.26 or later.

It's a pretty easy fix. @chihyuwu would you mind cutting a new release please?

vaskozl avatar Jan 31 '23 19:01 vaskozl

hi @vaskozl, Of course! Thanks for letting us know about it! A new version without the RPC_VOLUME_MOUNT_GROUP capability will be released soon to make sure that the plugin is compatible with k8s v1.26. We'll definitely put more emphasis on CSI's flexibility and compatibility in the future as well!

chihyuwu avatar Feb 03 '23 09:02 chihyuwu

A new version without the RPC_VOLUME_MOUNT_GROUP capability will be released soon to make sure that the plugin is compatible with k8s v1.26.

A new update of synology/synology-csi:latest is available now.

The problem here is that this CSI just declare the capability but it does not do anything to actually implement that like for other CSI. See here an example https://github.com/kubernetes-csi/csi-driver-smb/pull/379/files

In the previous version, we have tried to implement securityContext like the example, but still missed something. We'll check and fix it in the future.

chihyuwu avatar Feb 06 '23 07:02 chihyuwu

@chihyuwu Are you willing to move the image to GitHub instead of Docker Hub? Rate limit can be problematic on some specific environment sharing same outgoing global IP address(es).

inductor avatar Feb 06 '23 16:02 inductor

When it will be released?

camaeel avatar Mar 30 '23 21:03 camaeel

Thanks @schoeppi5 for the detailed description! Adding the K3s instructions for those that might need them until it is resolved:

  1. add these lines to /etc/rancher/k3s/config.yaml:
kubelet-arg:
  - feature-gates=DelegateFSGroupToCSIDriver=false
  1. systemctl restart k3s

Works for me on K3s v1.25.3.

tomasodehnal avatar Apr 01 '23 11:04 tomasodehnal

Works on <=1.25. 1.26 has this flag completely disabled.

camaeel avatar Apr 01 '23 12:04 camaeel