noobaa-core icon indicating copy to clipboard operation
noobaa-core copied to clipboard

Bug: failing to change ownership of the NFS based PVC for PostgreSQL pod by using kube_pv_chown utility

Open maratsal opened this issue 2 years ago • 7 comments

Environment info

  • NooBaa Version: 4.9.10, 4.10.5 (NooBaa operator version)
  • Platform: OpenShift 4.10.22
  • Storage class from NetApp filer provided over Trident CSI (backendType: ontap-nas-economy)

Actual behavior

  1. pod noobaa-db-pg-0 stuck in crash loop of the init container with following error message:
uid change has been identified - will change from uid: 99 to new uid: 10001
setting permissions of /var/lib/pgsql for user 10001
Error:got error when changing ownership of /var/lib/pgsql Error: Operation not permitted

real	0m0.005s
user	0m0.003s
sys	0m0.001s
Error from server (BadRequest): container "initialize-database" in pod "noobaa-db-pg-0" is waiting to start: PodInitializing

Expected behavior

  1. Pod starts.

Steps to reproduce

  1. Deploy ODF in standalone MCG mode (or noobaa operator) with postgres DB backend PVC from NFS storage.

More information - Screenshots / Logs / Other output

It might be because NFS storage is exported with root_squash option?

maratsal avatar Aug 01 '22 19:08 maratsal

Thanks @maratsal

root squash could very much be the root cause, our db init containers are a mess of trying to fix pvc permissions using root group (gid 0) and hacks to overcome postgres hugepages issue.

I wonder if you manually went to that pvc filesystem as root (or as current owner uid 99) and did a chown -R 10001:0 <db-volume-path> - would the init container run ok?

@dannyzaken @jackyalbo and @baum would probably know how to fix this problem.

guymguym avatar Aug 01 '22 22:08 guymguym

cannot change ownership with any ID (tried 99 and 10001), but have no problems with creating files and folder with any ID (please see Manual permissions change test below).

So maybe if PVC is from NFS storage we should not be running that permissions change in this code and having additional check like below:

prepare_postgres_pv() {
  local dir="/var/lib/pgsql"
  local dir_id=$(stat -c '%u' ${dir})
+  local dir_type=$(stat -f -c %T ${dir})
  local current_id=$(id -u)

  # change ownership and permissions of mongo db path
-  if [ "${dir_id}" != "${current_id}" ] 
+  if [ "${dir_id}" != "${current_id}" ] && [ "${dir_type}" != "nfs" ]
  then
    echo "uid change has been identified - will change from uid: ${dir_id} to new uid: ${current_id}"
    time ${KUBE_PV_CHOWN} postgres ${current_id}
  fi
}

Manual permissions change test:

[myuser@my-linux-host ~]$ oc rsh busybox-bbc4dc98-8gfnc # user id 99
~ $ id
uid=99(99) gid=0(root)
~ $ ls -la
total 20
dr-xr-xr-x    1 root     root            62 Aug  1 22:58 .
dr-xr-xr-x    1 root     root            62 Aug  1 22:58 ..
drwxr-xr-x    2 root     root         12288 Jul 29 01:32 bin
drwxr-xr-x    5 root     root           360 Aug  1 22:58 dev
drwxr-xr-x    1 root     root            66 Aug  1 22:58 etc
drwxr-xr-x    2 nobody   nobody           6 Jul 29 01:32 home
dr-xr-xr-x  309 root     root             0 Aug  1 22:58 proc
drwx------    2 root     root             6 Jul 29 01:32 root
drwxr-xr-x    3 root     root            42 Aug  1 22:58 run
dr-xr-xr-x   13 root     root             0 Jul 29 14:12 sys
drwxrwxrwx    2 99       99            4096 Aug  1 22:58 tmp
drwxr-xr-x    3 root     root            18 Jul 29 01:32 usr
drwxr-xr-x    1 root     root            17 Aug  1 22:58 var
~ $ df -h /tmp
Filesystem                Size      Used Available Use% Mounted on
10.32.10.245:/trident_qtree_pool_trident_GXYUPUKEEO/trident_pvc_bc950669_6230_400a_9c97_0150658c8450
                         50.0G         0     50.0G   0% /tmp
~ $ chown 10001 /tmp
chown: /tmp: Operation not permitted
~ $ mkdir /tmp/folder-id-99
~ $ ls -la /tmp/
total 8
drwxrwxrwx    3 99       99            4096 Aug  1 22:59 .
dr-xr-xr-x    1 root     root            62 Aug  1 22:58 ..
drwxr-xr-x    2 99       99            4096 Aug  1 22:59 folder-id-99
~ $ chown 10001 /tmp/folder-id-99/
chown: /tmp/folder-id-99/: Operation not permitted
~ $ 
command terminated with exit code 1
[myuser@my-linux-host ~]$ oc edit deploy busybox # change user id to 10001
deployment.apps/busybox edited
[myuser@my-linux-host ~]$ oc get pods | grep busybox
busybox-69fcd8b488-mpvsc                           1/1     Running       0          12s
busybox-bbc4dc98-8gfnc                             1/1     Terminating   0          2m2s
[myuser@my-linux-host ~]$ oc rsh busybox-69fcd8b488-mpvsc 
~ $ id
uid=10001(10001) gid=0(root)
~ $ cd /tmp
/tmp $ ls -la
total 8
drwxrwxrwx    3 99       99            4096 Aug  1 22:59 .
dr-xr-xr-x    1 root     root            62 Aug  1 23:00 ..
drwxr-xr-x    2 99       99            4096 Aug  1 22:59 folder-id-99
/tmp $ mkdir folder-id-10001
/tmp $ touch file-id-10001
/tmp $ ls -la
total 12
drwxrwxrwx    4 99       99            4096 Aug  1 23:01 .
dr-xr-xr-x    1 root     root            62 Aug  1 23:00 ..
-rw-r--r--    1 10001    99               0 Aug  1 23:01 file-id-10001
drwxr-xr-x    2 10001    99            4096 Aug  1 23:01 folder-id-10001
drwxr-xr-x    2 99       99            4096 Aug  1 22:59 folder-id-99
/tmp $ rm -rf *
/tmp $ ls -la
total 4
drwxrwxrwx    2 99       99            4096 Aug  1 23:01 .
dr-xr-xr-x    1 root     root            62 Aug  1 23:00 ..
/tmp $ 
[myuser@my-linux-host ~]$

maratsal avatar Aug 01 '22 23:08 maratsal

So maybe if PVC is from NFS storage we should not be running that permissions change

I am not sure if not running it will work, because we need to have permissions to write to all those files to run the database...

here is more context on this issue (notice answers 2 and 3 as well) -

https://stackoverflow.com/questions/43544370/kubernetes-how-to-set-volumemount-user-group-and-file-permissions

guymguym avatar Aug 01 '22 23:08 guymguym

I just tested it and it could start with no issues with my suggested changes (mounted modified script from the configmap):

[myuser@my-linux-host ~]$ oc logs -f --all-containers noobaa-db-pg-0 
+ set -x
+ export PGDATA=/var/lib/pgsql/data/userdata
+ PGDATA=/var/lib/pgsql/data/userdata
+ '[' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ p=/opt/rh/rh-postgresql12/root/usr/bin/postgres
+ '[' '!' -x /opt/rh/rh-postgresql12/root/usr/bin/postgres ']'
+ p=/usr/bin/postgres
+ mv /usr/bin/postgres /usr/bin/postgres.orig
+ echo exec /usr/bin/postgres.orig '"$@"' -c huge_pages=off
+ chmod 755 /usr/bin/postgres
+ sed -i -e 's/^\(postgres:[^:]\):[0-9]*:[0-9]*:/\1:10001:0:/' /etc/passwd
+ sed -i -e 's/^exec.*$/exit 0/' -e 's/^pg_ctl\sstart.*/pg_ctl start || true/' /usr/bin/run-postgresql
+ su postgres -c 'bash -x /usr/bin/run-postgresql'
+ export ENABLE_REPLICATION=false
+ ENABLE_REPLICATION=false
+ set -eu
++ cgroup-limits
+ export_vars='MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807
MEMORY_LIMIT_IN_BYTES=524288000
NUMBER_OF_CORES=4'
+ export MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807 MEMORY_LIMIT_IN_BYTES=524288000 NUMBER_OF_CORES=4
+ MAX_MEMORY_LIMIT_IN_BYTES=9223372036854775807
+ MEMORY_LIMIT_IN_BYTES=524288000
+ NUMBER_OF_CORES=4
+ source /usr/share/container-scripts/postgresql/common.sh
++ export POSTGRESQL_MAX_CONNECTIONS=100
++ POSTGRESQL_MAX_CONNECTIONS=100
++ export POSTGRESQL_MAX_PREPARED_TRANSACTIONS=0
++ POSTGRESQL_MAX_PREPARED_TRANSACTIONS=0
++ [[ '' == \t\r\u\e ]]
++ [[ -z 524288000 ]]
++ shared_buffers_computed=125MB
++ effective_cache=250MB
++ export POSTGRESQL_SHARED_BUFFERS=125MB
++ POSTGRESQL_SHARED_BUFFERS=125MB
++ export POSTGRESQL_EFFECTIVE_CACHE_SIZE=250MB
++ POSTGRESQL_EFFECTIVE_CACHE_SIZE=250MB
++ export POSTGRESQL_RECOVERY_FILE=/var/lib/pgsql/openshift-custom-recovery.conf
++ POSTGRESQL_RECOVERY_FILE=/var/lib/pgsql/openshift-custom-recovery.conf
++ export POSTGRESQL_CONFIG_FILE=/var/lib/pgsql/openshift-custom-postgresql.conf
++ POSTGRESQL_CONFIG_FILE=/var/lib/pgsql/openshift-custom-postgresql.conf
++ postinitdb_actions=
++ shopt -s dotglob
++ shopt -s extglob
+ set_pgdata
+ export PGDATA=/var/lib/pgsql/data/userdata
+ PGDATA=/var/lib/pgsql/data/userdata
+ mkdir -p /var/lib/pgsql/data/userdata
+ '[' -e /var/lib/pgsql/data/PG_VERSION ']'
+ chmod 700 /var/lib/pgsql/data/userdata
+ process_extending_files /opt/app-root/src/postgresql-pre-start /usr/share/container-scripts/postgresql/pre-start
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-pre-start /usr/share/container-scripts/postgresql/pre-start
++ sort -u
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-pre-start
++ continue
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/pre-start
++ continue
+ read filename
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-pre-start/
+ test -f /opt/app-root/src/postgresql-pre-start/
+ for dir in "$@"
+ local file=/usr/share/container-scripts/postgresql/pre-start/
+ test -f /usr/share/container-scripts/postgresql/pre-start/
+ read filename
+ check_env_vars
+ [[ -v POSTGRESQL_USER ]]
+ [[ -v POSTGRESQL_USER ]]
+ [[ -v POSTGRESQL_PASSWORD ]]
+ [[ -v POSTGRESQL_DATABASE ]]
+ '[' 6 -le 63 ']'
+ '[' 6 -le 63 ']'
+ postinitdb_actions+=,simple_db
+ '[' -v POSTGRESQL_ADMIN_PASSWORD ']'
+ '[' -v POSTGRESQL_MIGRATION_REMOTE_HOST -a -v POSTGRESQL_MIGRATION_ADMIN_PASSWORD ']'
+ case "$postinitdb_actions" in
+ generate_passwd_file
++ id -u
+ export USER_ID=10001
+ USER_ID=10001
++ id -g
+ export GROUP_ID=0
+ GROUP_ID=0
+ grep -v -e '^postgres' -e '^10001' /etc/passwd
+ echo 'postgres:x:10001:0:PostgreSQL Server:/var/lib/pgsql:/bin/bash'
+ export LD_PRELOAD=libnss_wrapper.so
+ LD_PRELOAD=libnss_wrapper.so
+ export NSS_WRAPPER_PASSWD=/var/lib/pgsql/passwd
+ NSS_WRAPPER_PASSWD=/var/lib/pgsql/passwd
+ export NSS_WRAPPER_GROUP=/etc/group
+ NSS_WRAPPER_GROUP=/etc/group
+ generate_postgresql_config
+ envsubst
+ '[' false == true ']'
+ should_hack_data_sync_retry
++ uname -p
+ '[' x86_64 == x86_64 ']'
+ return 1
+ shopt -s nullglob
+ PG_INITIALIZED=false
+ '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ initialize_database
+ initdb_wrapper initdb
+ LANG=en_US.utf8
+ initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  C
  CTYPE:    en_US.utf8
  MESSAGES: en_US.utf8
  MONETARY: en_US.utf8
  NUMERIC:  en_US.utf8
  TIME:     en_US.utf8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok


Success. You can now start the database server using:

    pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
+ cat
+ cat
+ PG_INITIALIZED=:
+ pg_ctl start
waiting for server to start....2022-08-02 14:18:52.849 UTC [39] LOG:  starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:52.849 UTC [39] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-08-02 14:18:52.849 UTC [39] LOG:  listening on IPv6 address "::", port 5432
2022-08-02 14:18:52.856 UTC [39] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-02 14:18:52.862 UTC [39] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-02 14:18:52.882 UTC [39] LOG:  redirecting log output to logging collector process
2022-08-02 14:18:52.882 UTC [39] HINT:  Future log output will appear in directory "log".
 done
server started
+ pg_isready
/var/run/postgresql:5432 - accepting connections
+ :
+ process_extending_files /opt/app-root/src/postgresql-init /usr/share/container-scripts/postgresql/init
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-init /usr/share/container-scripts/postgresql/init
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-init
++ continue
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/init
++ continue
++ sort -u
+ read filename
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-init/
+ test -f /opt/app-root/src/postgresql-init/
+ for dir in "$@"
+ local file=/usr/share/container-scripts/postgresql/init/
+ test -f /usr/share/container-scripts/postgresql/init/
+ read filename
+ migrate_db
+ test ,simple_db = ,migration
+ return 0
+ create_users
+ [[ ,,simple_db, = *,simple_db,* ]]
+ createuser noobaa
+ createdb --owner=noobaa nbcore
+ '[' -v POSTGRESQL_MASTER_USER ']'
+ process_extending_files /opt/app-root/src/postgresql-start /usr/share/container-scripts/postgresql/start
+ local filename dir
++ get_matched_files '*.sh' /opt/app-root/src/postgresql-start /usr/share/container-scripts/postgresql/start
++ local 'pattern=*.sh' dir
++ shift
++ for dir in "$@"
++ test -d /opt/app-root/src/postgresql-start
++ continue
waiting for server to start....2022-08-02 14:18:56.186 UTC [22] LOG:  starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:56.189 UTC [22] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
++ for dir in "$@"
++ test -d /usr/share/container-scripts/postgresql/start
2022-08-02 14:18:56.196 UTC [22] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-08-02 14:18:56.282 UTC [22] LOG:  redirecting log output to logging collector process
2022-08-02 14:18:56.282 UTC [22] HINT:  Future log output will appear in directory "log".
 done
++ find -L /usr/share/container-scripts/postgresql/start -maxdepth 1 -type f -name '*.sh' -printf '%f\n'
++ sort -u
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
+ read filename
server started
/var/run/postgresql:5432 - accepting connections
+ for dir in "$@"
+ local file=/opt/app-root/src/postgresql-start/set_passwords.sh
+ test -f /opt/app-root/src/postgresql-start/set_passwords.sh
+ for dir in "$@"
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
+ local file=/usr/share/container-scripts/postgresql/start/set_passwords.sh
+ test -f /usr/share/container-scripts/postgresql/start/set_passwords.sh
+ echo '=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...'
+ source /usr/share/container-scripts/postgresql/start/set_passwords.sh
ALTER ROLE
++ [[ ,,simple_db, = *,simple_db,* ]]
waiting for server to shut down.... done
++ _psql --set=username=noobaa --set=password=SeX263+iGcHjJg==
++ psql --set ON_ERROR_STOP=1 --set=username=noobaa --set=password=SeX263+iGcHjJg==
server stopped
Starting server...
2022-08-02 14:18:56.553 UTC [1] LOG:  starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-08-02 14:18:56.553 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
ALTER ROLE
++ '[' -v POSTGRESQL_MASTER_USER ']'
++ '[' -v POSTGRESQL_ADMIN_PASSWORD ']'
2022-08-02 14:18:56.553 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2022-08-02 14:18:56.558 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-02 14:18:56.562 UTC [1] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
+ set -e
+ break
+ read filename
+ pg_ctl stop
waiting for server to shut down.... done
2022-08-02 14:18:56.629 UTC [1] LOG:  redirecting log output to logging collector process
2022-08-02 14:18:56.629 UTC [1] HINT:  Future log output will appear in directory "log".
server stopped
Starting server...
+ unset_env_vars
+ unset POSTGRESQL_DATABASE POSTGRESQL_USER POSTGRESQL_PASSWORD POSTGRESQL_ADMIN_PASSWORD
+ echo 'Starting server...'
+ exit 0
^C
[myuser@my-linux-host ~]$ oc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
noobaa-core-0                                      1/1     Running   0          18h
noobaa-db-pg-0                                     1/1     Running   0          56s
noobaa-operator-698c99d7fc-qjvj2                   1/1     Running   0          3d20h
ocs-metrics-exporter-5594fd8456-v6xvh              1/1     Running   0          3d23h
ocs-operator-74f7b6dcf-m7swl                       1/1     Running   0          3d23h
odf-console-dc88447b5-xnf4t                        1/1     Running   0          3d23h
odf-operator-controller-manager-5b66b7969f-kvfp8   2/2     Running   0          3d23h
rook-ceph-operator-5f68dc6b44-j2sf7                1/1     Running   0          3d23h
[myuser@my-linux-host ~]$

maratsal avatar Aug 02 '22 14:08 maratsal

I now see the same problem on podman machine + kind clsuter.

$ kubectl logs noobaa-db-pg-0 -c init
uid change has been identified - will change from uid: 0 to new uid: 10001
setting permissions of /var/lib/pgsql for user 10001
Error:got error when changing ownership of /var/lib/pgsql Error: Operation not permitted

@dannyzaken what is our direction here??

guymguym avatar Aug 05 '22 15:08 guymguym

I feel like my proposed code should work just fine, because we have permissions on group level and that's why we don't have problems with writing to the NFS share with any user as I posted above.

here is recommendations around it from Red Hat:

Adapting Containerfiles for OpenShift When you write or change a Containerfile that builds an image to run on an OpenShift cluster, you need to address the following:

  • Directories and files that are read from or written to by processes in the container should be owned by the root group and have group read or group write permission.
  • Files that are executable should have group execute permissions.
  • The processes running in the container must not listen on privileged ports (that is, ports below 1024), because they are not running as privileged users. Adding the following RUN instruction to your Containerfile sets the directory and file permissions to allow users in the root group to access them in the container:
RUN chgrp -R 0 directory && \
chmod -R g=u directory

The user account that runs the container is always a member of the root group, hence the container can read and write to this directory. The root group does not have any special permissions (unlike the root user) which minimizes security risks with this configuration. The g=u argument from the chmod command makes the group permissions equal to the owner user permissions, which by default are read and write. You can use the g+rwX argument with the same results.

maratsal avatar Aug 05 '22 15:08 maratsal

@dannyzaken will you assign this?

guymguym avatar Aug 08 '22 07:08 guymguym

If I am not mistaken, I think we can get rid of all the init containers hacks and use fsGroup and fsGroupChangePolicy https://kubernetes.io/blog/2020/12/14/kubernetes-release-1.20-fsgroupchangepolicy-fsgrouppolicy/

dannyzaken avatar Jan 17 '23 09:01 dannyzaken

@dannyzaken, @guymguym I recently picked up this issue. I am not sure if we can replace our init container hacks with fsUser, fsGroup and fsGroupChangePolicy.

Kubernetes will also do exactly what we are doing (SetVolumeOwnership) which implies that it might not really help us in this particular scenario where NFS mount is exported with root_squash hence NFS is performing operations with nobody:nogroup - when root (AFAIK) - resulting in operation getting denied.

Maybe, instead what we can do is check in our init container if we can read and write to the directory, if we can - skip any ownership changes or else change ownership as usual.

PS: If we can't change the ownership AND we can't read/write to the FS then I think that's a good enough reason to deny pod start (which is not the case here).

tangledbytes avatar Mar 17 '23 14:03 tangledbytes