helm
helm copied to clipboard
nextcloud-nginx container crashlooping after securityContext update; `/var/www/html/config` always owned by root
Description
I've edited this for full context of how we got here, as this issue is getting kind of long, because it needed to be tested in a lot of different ways which lead me in several directions.
This issue is a continuation of the conversation started after #269 was merged. I was originally trying to changed the podSecurityContext.runAsUser and podSecurityContext.runAsGroup to 33 because I was trying to diagnose why the /var/www/html/config directory was always owned by root. I am deploying the nextcloud helm chart using persistent volumes on k3s with the default local path provisioner.
I learned that the podSecurityContext.fsGroup was always being set to 82 anytime you used nginx.enabled and didn't set podSecurityContext.fsGroup explicitly, so I submitted a draft PR here to fix it to so that it checks image.flavor for alpine: https://github.com/nextcloud/helm/pull/379
Through the comments here you can see other things I'm currently testing, because I'm still not sure is it's just the local path provisioner on k3s or k3s itself or what, but the best I can get is 🤷 I'll update this issue description with more clarity as it comes.
Original Issue that was opened on Jan 23
The nginx container in the nextcloud pod won't start and complains of a readonly file system, even if I try to only set the nextcloud.securityContext.
I have created a new cluster and deployed nextcloud with the securityContext parameters from the values.yaml of this repo, including the nginx security context. My entire values.yaml is here, but the parts that matter are:
securityContext parameters in my old `values.yaml`
nextcloud:
# securityContext parameters. For example you may need to define runAsNonRoot directive
securityContext:
runAsUser: 33
runAsGroup: 33
runAsNonRoot: true
readOnlyRootFilesystem: false
# securityContext parameters. For example you may need to define runAsNonRoot directive
podSecurityContext:
runAsUser: 33
runAsGroup: 33
runAsNonRoot: true
readOnlyRootFilesystem: false
...
nginx:
## You need to set an fpm version of the image for nextcloud if you want to use nginx!
enabled: true
image:
repository: nginx
tag: alpine
pullPolicy: Always
# this is copied almost directly from the values.yaml, but I changed readOnlyRootFilesystem to false while testing
securityContext:
runAsUser: 82
runAsGroup: 33
runAsNonRoot: true
readOnlyRootFilesystem: false
The nextcloud pod is in a crashloopbackoff state with the offending container being nginx, and this being the logs:
│ 2023-01-23T15:44:59.428798413+01:00 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration │
│ 2023-01-23T15:44:59.428820874+01:00 /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/ │
│ 2023-01-23T15:44:59.429173908+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh │
│ 2023-01-23T15:44:59.429979412+01:00 10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file sy │
│ stem?) │
│ 2023-01-23T15:44:59.430071429+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh │
│ 2023-01-23T15:44:59.431167356+01:00 /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh │
│ 2023-01-23T15:44:59.431715519+01:00 /docker-entrypoint.sh: Configuration complete; ready for start up │
│ 2023-01-23T15:44:59.433513935+01:00 2023/01/23 14:44:59 [emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied) │
│ 2023-01-23T15:44:59.433519229+01:00 nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied) │
│ 2023-01-23T14:45:24.296336176Z Stream closed EOF for nextcloud/nextcloud-web-app-66fc5dfcb7-kxlnp (nextcloud-nginx)
This is the resulting deployment.yaml when I do a kubectl get deployment -n nextcloud nextcloud-web-app > deployment.yaml:
Click me for the nextcloud deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2023-01-23T14:43:58Z"
generation: 52
labels:
app.kubernetes.io/component: app
app.kubernetes.io/instance: nextcloud-web-app
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: nextcloud
argocd.argoproj.io/instance: nextcloud-web-app
helm.sh/chart: nextcloud-3.4.1
name: nextcloud-web-app
namespace: nextcloud
resourceVersion: "3340"
uid: cde1dd07-103a-4c04-931d-071ab3c5b448
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: app
app.kubernetes.io/instance: nextcloud-web-app
app.kubernetes.io/name: nextcloud
strategy:
type: Recreate
template:
metadata:
annotations:
nextcloud-config-hash: d1d9ac6f86f643b460f8e4e8e886b65382ad49aede8762f8ea74ccd86b7e3f28
nginx-config-hash: 16c61772d9e74de7322870fd3a045598ea01f6e16be155d116423e6a246dcddc
php-config-hash: 44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
creationTimestamp: null
labels:
app.kubernetes.io/component: app
app.kubernetes.io/instance: nextcloud-web-app
app.kubernetes.io/name: nextcloud
spec:
containers:
- env:
- name: POSTGRES_HOST
value: nextcloud-web-app-postgresql
- name: POSTGRES_DB
value: nextcloud
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
key: username
name: nextcloud-pgsql-credentials
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: nextcloudPassword
name: nextcloud-pgsql-credentials
- name: NEXTCLOUD_ADMIN_USER
valueFrom:
secretKeyRef:
key: username
name: nextcloud-admin-credentials
- name: NEXTCLOUD_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: nextcloud-admin-credentials
- name: NEXTCLOUD_TRUSTED_DOMAINS
value: nextcloud.vleermuis.tech
- name: NEXTCLOUD_DATA_DIR
value: /var/www/html/data
image: nextcloud:25.0.3-fpm
imagePullPolicy: Always
name: nextcloud
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/www/
name: nextcloud-main
subPath: root
- mountPath: /var/www/html
name: nextcloud-main
subPath: html
- mountPath: /var/www/html/data
name: nextcloud-main
subPath: data
- mountPath: /var/www/html/config
name: nextcloud-main
subPath: config
- mountPath: /var/www/html/custom_apps
name: nextcloud-main
subPath: custom_apps
- mountPath: /var/www/tmp
name: nextcloud-main
subPath: tmp
- mountPath: /var/www/html/themes
name: nextcloud-main
subPath: themes
- mountPath: /var/www/html/config/logging.config.php
name: nextcloud-config
subPath: logging.config.php
- mountPath: /var/www/html/config/proxy.config.php
name: nextcloud-config
subPath: proxy.config.php
- mountPath: /var/www/html/config/.htaccess
name: nextcloud-config
subPath: .htaccess
- mountPath: /var/www/html/config/apache-pretty-urls.config.php
name: nextcloud-config
subPath: apache-pretty-urls.config.php
- mountPath: /var/www/html/config/apcu.config.php
name: nextcloud-config
subPath: apcu.config.php
- mountPath: /var/www/html/config/apps.config.php
name: nextcloud-config
subPath: apps.config.php
- mountPath: /var/www/html/config/autoconfig.php
name: nextcloud-config
subPath: autoconfig.php
- mountPath: /var/www/html/config/redis.config.php
name: nextcloud-config
subPath: redis.config.php
- mountPath: /var/www/html/config/smtp.config.php
name: nextcloud-config
subPath: smtp.config.php
- image: nginx:alpine
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: Host
value: nextcloud.vleermuis.tech
path: /status.php
port: http
scheme: HTTP
initialDelaySeconds: 45
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
name: nextcloud-nginx
ports:
- containerPort: 80
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: Host
value: nextcloud.vleermuis.tech
path: /status.php
port: http
scheme: HTTP
initialDelaySeconds: 45
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 5
resources: {}
securityContext:
readOnlyRootFilesystem: false
runAsGroup: 33
runAsNonRoot: true
runAsUser: 82
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/www/
name: nextcloud-main
subPath: root
- mountPath: /var/www/html
name: nextcloud-main
subPath: html
- mountPath: /var/www/html/data
name: nextcloud-main
subPath: data
- mountPath: /var/www/html/config
name: nextcloud-main
subPath: config
- mountPath: /var/www/html/custom_apps
name: nextcloud-main
subPath: custom_apps
- mountPath: /var/www/tmp
name: nextcloud-main
subPath: tmp
- mountPath: /var/www/html/themes
name: nextcloud-main
subPath: themes
- mountPath: /etc/nginx/nginx.conf
name: nextcloud-nginx-config
subPath: nginx.conf
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- until pg_isready -h nextcloud-web-app-postgresql -U ${POSTGRES_USER} ; do
sleep 2 ; done
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
key: username
name: nextcloud-pgsql-credentials
image: bitnami/postgresql:14.4.0-debian-11-r23
imagePullPolicy: IfNotPresent
name: postgresql-isready
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 82
runAsGroup: 33
runAsNonRoot: true
runAsUser: 33
serviceAccount: nextcloud-serviceaccount
serviceAccountName: nextcloud-serviceaccount
terminationGracePeriodSeconds: 30
volumes:
- name: nextcloud-main
persistentVolumeClaim:
claimName: nextcloud-files
- configMap:
defaultMode: 420
name: nextcloud-web-app-config
name: nextcloud-config
- configMap:
defaultMode: 420
name: nextcloud-web-app-nginxconfig
name: nextcloud-nginx-config
status:
conditions:
- lastTransitionTime: "2023-01-23T14:43:58Z"
lastUpdateTime: "2023-01-23T14:43:58Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2023-01-23T14:43:58Z"
lastUpdateTime: "2023-01-23T14:43:58Z"
message: ReplicaSet "nextcloud-web-app-66fc5dfcb7" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 52
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
Where does the UID 82 come from? (Edit: it comes from the alpine nextcloud and nginx images - that's www-data)
I set that to 33 (nextcloud's www-data user) to test, but it didn't seem to make a difference. Just so it's clear, without editing any of the security contexts, everything works, but I would like the security context to work, because otherwise it causes my restores from backups to fail, because the /var/www/html/config directory is always created with root ownership, which means if the restores run as www-data, they can't restore that particular directory, which is important. I'm hoping the security context fixes that, so that nothing has to run as root in this stack.
I'm deploying the 3.4.1 nextcloud helm chart via Argo CD onto k3s on Ubuntu 22.04.
Update: problem still present in 3.5.7 helm chart.
Adding my experience:
Just moved my directory from a hostpath to nfs then started encountering permission issues. I chown -R 33:33 the whole directory and set the security context. This is my error now:
4 Configuring Redis as session handler
3 /entrypoint.sh: 78: cannot create /usr/local/etc/php/conf.d/redis-session.ini: Permission denied
2 Initializing nextcloud 25.0.3.2 ...
1 touch: cannot touch '/var/www/html/nextcloud-init-sync.lock': Permission denied
@FrankelJb are you also using nginx? Which security contexts are you setting? There's a few that you can set. If we could get the security context settings from your values.yaml, that would help in comparing states. Thank you for sharing!
@jessebot I'm not using Nginx. I'm almost ready to give up on NC in kubernetes (I can't upgrade now). I've managed to solve this issue. I was trying to use a single redis cluster for all my services. However, I had to give up on that dream because NC refused to connect without a password. I'm not sure if that's caused by a config in the helm chart or my configuration error. Thanks for being so responsive :)
I'm sorry you're having a bad time with this. I also had a bad time with this at first and then became sort of obsessed with trying to fix it for others too 😅
If you can post your values.yaml (after removing sensitive info) I can help troubleshoot it for you :)
UID 82 comes from the Nextcloud fpm alpine image. If you use another image instead of alpine, I believe the user is 33. The NGINX container you use is an alpine based image, so you have to make sure the group and fsgroup match for both containers.
For example my (abbreviated) deployment:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nextcloud
namespace: nextcloud
labels:
app: nextcloud
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nextcloud
template:
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/nextcloud: localhost/container-nextcloud
container.apparmor.security.beta.kubernetes.io/nginx: localhost/container-nginx
labels:
app: nextcloud
spec:
automountServiceAccountToken: false
containers:
- name: nextcloud
image: "nextcloud:24.0.9-fpm-alpine"
securityContext:
runAsUser: 82
allowPrivilegeEscalation: false
privileged: false
runAsNonRoot: true
capabilities:
drop:
- ALL
seccompProfile:
type: Localhost
localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json
- name: nginx
image: cgr.dev/chainguard/nginx:1.23.3
securityContext:
allowPrivilegeEscalation: false
privileged: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
seccompProfile:
type: Localhost
localhostProfile: operator/nextcloud/nginx-seccomp-profile.json
# Will mount configuration files as www-data (id: 82) for nextcloud
securityContext:
fsGroup: 82
serviceAccountName: nextcloud-serviceaccount
You can see I use a distroless NGINX container image, bu the principle is the same.
@jessebot here is a link to my values.yaml. I've just tried to recreate with flux, moving from agocd, and it just waits on "Initializing nextcloud 25.0.4.1 ..." for minutes. It was working with the same yaml, the deployment took 45 minutes last time.
@FrankelJb , for Argo CD, I detailed some of my trials in https://github.com/nextcloud/helm/issues/336#issuecomment-1509829893 if that's at all helpful.
For this issue owned by root issue, also discussed in #114 , I finally got around to testing it (after battling argo 😅 ), and I've noted that all of the securityContext parameters I've tested (nextcloud, nginx, and the nextcloud pod) seem to work kind of, but the following directories are always owned by root on the nextcloud container:
I don't know why though. At first, I thought it was a persistence thing, but then I disabled persistence and it's still an issue. You can kind of see me live testing with 3.5.7 nextcloud chart here, but each thing I test leads me further to believing there's something going on with our volume mounts? I've been using the 26.0.0-fpm image, but I haven't tested the regular image or the alpine image like @Jeroen0494 suggested, yet.
Note: This /var/www/html/config directory owned by root doesn't happen when using the nextcloud docker container directly and setting it to run as nonroot. This only happens with the helm chart.
@provokateurin or @tvories have you been able to get this to work? I can get every other directory to be created as any other user, but the directories from the screenshot seem to always be owned by root. You can see my values.yaml here, but I don't know what else we need to set here 🤔 Are there security contexts for persistent volumes? Or perhaps mount options we need to set for the configmap when it gets mounted? It's been months, albeit in my off hours, but I'm still so confused.
@Jeroen0494 , I switched to the 26.0.0-fpm-alpine tag and also added most of the options you'd added and /var/www/html/config is still owned by root when deploying with this helm chart. You can see the full values.yaml I tried here, but the important parts are this:
image:
repository: nextcloud
flavor: fpm-alpine
pullPolicy: Always
nextcloud:
# Set securityContext parameters. For example, you may need to define runAsNonRoot directive
securityContext:
runAsUser: 82
runAsGroup: 82
runAsNonRoot: true
readOnlyRootFilesystem: false
allowPrivilegeEscalation: false
privileged: false
capabilities:
drop:
- ALL
podSecurityContext:
fsGroup: 82
...
# this is deprecated, but I figured why not, anything to change that one config directory from root (but it didn't work)
securityContext:
fsGroup: 82
I can't figure out what else it would be. Maybe a script in the container itself? 🤔 Are you using the helm chart and using persistence? Is your /var/www/html/config owned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:
seccompProfile:
type: Localhost
localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.json
I see it described here in the k8s api docs, but it doesn't link further for what goes in localhostProfile.
@jessebot Not sure if it is the same issue but maybe it will help. I'm using 25-alpine with hostPath PV and even though I have set securityContext in the pod and used the same id for the ownership of the path on the host, the mapped subdirectories of the PV were owned by root:root and container was stuck on:
/entrypoint.sh: 104: cannot create /var/www/html/nextcloud-init-sync.lock: Permission denied
I resolved it by manually changing the ownership of the subdirs on the host to the same uid.
@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue 😁 ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?
The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.
Here's one of my PVCs which is using the local path provisioner, since I'm using k3s:
# Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: nextcloud
name: nextcloud-files
annotations:
k8up.io/backup: "true"
volumeType: local
spec:
storageClassName: local-path
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Still looking if there's anything that can be done here, but from my research, this might just be something that needs to be solved in an init container, which I might have to make a PR for :(
Update: tested without any sort of values.yaml, using all default settings on k3s with chart version 3.5.8 and only nextcloud-init-sync.lock is owned by root like this:
-rw-r--r-- 1 root www-data 0 Apr 16 09:01 nextcloud-init-sync.lock
but that's without any persistence or configurations enabled 🤔
Re: nextcloud-init-sync.lock
That file is actually owned by root by default in all the nextcloud docker containers, but only that one file (it occurs in both the docker container directly and in the helm chart).
Example Default Permissions on nextcloud:fpm-alpine Docker Container
$ docker run -d nextcloud:fpm-alpine
Unable to find image 'nextcloud:fpm-alpine' locally
fpm-alpine: Pulling from library/nextcloud
f56be85fc22e: Pull complete
ace8de9a4ff5: Pull complete
ac818333da4c: Pull complete
10f4138fad9a: Pull complete
04049f99cb8d: Pull complete
93231f0bdcb6: Pull complete
ab266ad8891c: Pull complete
552295b4d6d8: Pull complete
cffafb46943d: Pull complete
4964abd498c6: Pull complete
a05442d246e3: Pull complete
42633b5b39c2: Pull complete
6f8014cbce5e: Pull complete
18729ba22f88: Pull complete
9eedd0061e2b: Pull complete
97d1b1593a77: Pull complete
Digest: sha256:9a08c42558cda7d48de2cc3da9f5150eeed81e7595aa4c2c5ace6612c3923240
Status: Downloaded newer image for nextcloud:fpm-alpine
688a243c0388ca26541b0d39cc5ebe3c83ad41df617aa601e28e08a258319dfa
$ docker exec -it frosty_mendel /bin/sh
/var/www/html # ls -hal
total 180K
drwxrwxrwt 15 www-data www-data 4.0K Apr 16 08:42 .
drwxrwxr-x 1 www-data root 4.0K Apr 14 20:46 ..
-rw-r--r-- 1 www-data www-data 3.2K Apr 16 08:42 .htaccess
-rw-r--r-- 1 www-data www-data 101 Apr 16 08:42 .user.ini
drwxr-xr-x 45 www-data www-data 4.0K Apr 16 08:42 3rdparty
-rw-r--r-- 1 www-data www-data 18.9K Apr 16 08:42 AUTHORS
-rw-r--r-- 1 www-data www-data 33.7K Apr 16 08:42 COPYING
drwxr-xr-x 50 www-data www-data 4.0K Apr 16 08:42 apps
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 config
-rw-r--r-- 1 www-data www-data 4.0K Apr 16 08:42 console.php
drwxr-xr-x 24 www-data www-data 4.0K Apr 16 08:42 core
-rw-r--r-- 1 www-data www-data 6.2K Apr 16 08:42 cron.php
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 custom_apps
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 data
drwxr-xr-x 2 www-data www-data 12.0K Apr 16 08:42 dist
-rw-r--r-- 1 www-data www-data 156 Apr 16 08:42 index.html
-rw-r--r-- 1 www-data www-data 3.4K Apr 16 08:42 index.php
drwxr-xr-x 6 www-data www-data 4.0K Apr 16 08:42 lib
-rw-r--r-- 1 root root 0 Apr 16 08:42 nextcloud-init-sync.lock
-rwxr-xr-x 1 www-data www-data 283 Apr 16 08:42 occ
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocm-provider
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocs
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:42 ocs-provider
-rw-r--r-- 1 www-data www-data 3.1K Apr 16 08:42 public.php
-rw-r--r-- 1 www-data www-data 5.4K Apr 16 08:42 remote.php
drwxr-xr-x 4 www-data www-data 4.0K Apr 16 08:42 resources
-rw-r--r-- 1 www-data www-data 26 Apr 16 08:42 robots.txt
-rw-r--r-- 1 www-data www-data 2.4K Apr 16 08:42 status.php
drwxr-xr-x 3 www-data www-data 4.0K Apr 16 08:42 themes
-rw-r--r-- 1 www-data www-data 384 Apr 16 08:42 version.php
Running docker with --user 82:82 fixes the issue on the alpine image (you'd use 33 for the non-alpine images) as you can see here (but that's not helpful for k8s itself 😞 since this was using docker directly):
Example Fixed Permissions on nextcloud:fpm-alpine Docker Container
$ docker run -d --user 82:82 nextcloud:fpm-alpine
9761e3ff869b3ad026ef5bf10b333d5c52c2ec0ad6b5dd212016d083c8888dd3
$ docker exec -it eager_buck /bin/sh
/var/www/html $ ls -hal
total 180K
drwxrwxrwt 15 www-data root 4.0K Apr 16 08:48 .
drwxrwxr-x 1 www-data root 4.0K Apr 14 20:46 ..
-rw-r--r-- 1 www-data www-data 3.2K Apr 16 08:48 .htaccess
-rw-r--r-- 1 www-data www-data 101 Apr 16 08:48 .user.ini
drwxr-xr-x 45 www-data www-data 4.0K Apr 16 08:48 3rdparty
-rw-r--r-- 1 www-data www-data 18.9K Apr 16 08:48 AUTHORS
-rw-r--r-- 1 www-data www-data 33.7K Apr 16 08:48 COPYING
drwxr-xr-x 50 www-data www-data 4.0K Apr 16 08:48 apps
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 config
-rw-r--r-- 1 www-data www-data 4.0K Apr 16 08:48 console.php
drwxr-xr-x 24 www-data www-data 4.0K Apr 16 08:48 core
-rw-r--r-- 1 www-data www-data 6.2K Apr 16 08:48 cron.php
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 custom_apps
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 data
drwxr-xr-x 2 www-data www-data 12.0K Apr 16 08:48 dist
-rw-r--r-- 1 www-data www-data 156 Apr 16 08:48 index.html
-rw-r--r-- 1 www-data www-data 3.4K Apr 16 08:48 index.php
drwxr-xr-x 6 www-data www-data 4.0K Apr 16 08:48 lib
-rw-r--r-- 1 www-data www-data 0 Apr 16 08:48 nextcloud-init-sync.lock
-rwxr-xr-x 1 www-data www-data 283 Apr 16 08:48 occ
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocm-provider
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocs
drwxr-xr-x 2 www-data www-data 4.0K Apr 16 08:48 ocs-provider
-rw-r--r-- 1 www-data www-data 3.1K Apr 16 08:48 public.php
-rw-r--r-- 1 www-data www-data 5.4K Apr 16 08:48 remote.php
drwxr-xr-x 4 www-data www-data 4.0K Apr 16 08:48 resources
-rw-r--r-- 1 www-data www-data 26 Apr 16 08:48 robots.txt
-rw-r--r-- 1 www-data www-data 2.4K Apr 16 08:48 status.php
drwxr-xr-x 3 www-data www-data 4.0K Apr 16 08:48 themes
-rw-r--r-- 1 www-data www-data 384 Apr 16 08:48 version.php
@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?
When using existing storage and the owner of the files is root, when switching to a non-root container it wouldn't be able to change the owner. You'd have to change the owner on the storage medium itself with a chown.
Does the issue exist when using no attached storage? And when using emptyDir? An when using PVC template with local-path-provisioner?
I can't figure out what else it would be. Maybe a script in the container itself? thinking Are you using the helm chart and using persistence? Is your
/var/www/html/configowned by root? Are you using k3s or another k8s on metal by chance? The only thing I didn't try from your output try was this, because I'm not sure where that file comes from or what should go in it:seccompProfile: type: Localhost localhostProfile: operator/nextcloud/nextcloud-seccomp-profile.jsonI see it described here in the k8s api docs, but it doesn't link further for what goes in
localhostProfile.
I'm using the security profiles operator and have written my own seccomp profile. You may ignore this line, or switch type to RuntimeDefault.
Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.
Thanks for getting back to me, @Jeroen0494 🙏
Currently I'm not using the Helm chart, because I require certain changes (that I've created a PR for). But all my YAML's are based on the Helm chart.
Commented on that PR and will take another look after conflicts are resolved :) Will still probably ping Kate in though, as the PR is large.
@jessebot are you experiencing these storage permission errors only on already existing storage or also when using an emptyDir for example?
Let me try with emptyDir actually. 🤔 I've been doing this on a fresh k3s cluster each time. I completely destroy the cluster and it's storage before testing a new cluster. I checked /var/lib/rancher after removing k3s and there isn't anything in that directory, though the directory is owned by root, however the directories within it should not be. I use smol-k8s-lab for deploying and destroying local k3s clusters. Let me spin up a new cluster and check the ownership of the directory after that.
Does the issue exist when using no attached storage?
No, the issue doesn't exist when I don't use any persistence. Well, except for the nextcloud-init-sync.lock file, which is always owned by root, but that's not what I'm after right now. I'm after the /var/www/html/config dir. Detailed more info on that lock file here: https://github.com/nextcloud/helm/issues/335#issuecomment-1510203221
Could you also try with a local mount, instead of using the local path provisioner?
For example, my PV:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud-data
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: nextcloud-data
namespace: nextcloud
local:
path: /data/crypt/nextcloud/data/
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- mediaserver.fritz.box
persistentVolumeReclaimPolicy: Retain
volumeMode: Filesystem
Here's what else I tried recently:
I do not know how to set an emptyDir with the current values.yaml 🤔
Creating a Persistent Volume with spec.hostPath.path
I was previously using a dynamic pvc, but here's the new setup I tried, using the 26.0.0-fpm tag again this time, so I only did changed the securityContext for the nextcloud container, since nginx isn't even what I'm troubleshooting, so I didn't set nextcloud.podSecurityContext. Here's the PV and existing PVC for nextcloud files:
PV and PVC yaml
---
kind: PersistentVolume
apiVersion: v1
metadata:
namespace: nextcloud
name: nextcloud
spec:
storageClassName: local-path
capacity:
storage: 11Gi
accessModes:
- ReadWriteOnce
hostPath:
path: '/data/nextcloud'
---
# Dynamic persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: nextcloud
name: nextcloud-files
annotations:
k8up.io/backup: "true"
spec:
volumeName: nextcloud
storageClassName: local-path
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
The above still failed, so I'm beginning to think this is k3s related... because the directory I specified, I also created as user 33:33, which is also www-data on the host machine.
I found this k3s issue, #3704, and whatever the fix was, just didn't seem to work? There's nother PR opened here, #7217, which may fix it but 🤷
Creating a Persistent Volume with spec.local.path
Next I tried the second thing you suggested, @Jeroen0494 , with a PV that has spec.local.path like this, making sure also that /data/nextcloud was cleaned between runs and was owned by www-data which is UID 33 in both the securityContext for the nextcloud container and the host node:
PV and PVC yaml
---
# using local path instead of local-path provisioner directly
apiVersion: v1
kind: PersistentVolume
metadata:
namespace: nextcloud
name: nextcloud
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: nextcloud-files
namespace: nextcloud
local:
path: /data/nextcloud/
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- compufam
---
# persistent volume claim for nexctcloud data (/var/www/html) to persist
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: nextcloud
name: nextcloud-files
annotations:
k8up.io/backup: "true"
spec:
volumeName: nextcloud
# tried with AND *without* storageClassName set int he pvc
storageClassName: local-path
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
This also fails, and what's weird is that I'm not using the alpine container for nextcloud, but it still changed the group ownership to UID 82 but also left the user as root for all the same directories as previously 🤷 :
Edit: Just realized I left spec.storageClassName: local-path in the persistent volume claim, so tried again without it and same result with the UID 82 above before the edit. I think we need to fix that, because that's coming from the deployment.yaml here, where it says to always set the fsGroup for the nextcloud container to 82 if nginx is enabled, but using the nginx-alpine container, doesn't mean that a user is using an alpine nextcloud container, so setting the fsgroup to 82 here doesn't make sense:
https://github.com/nextcloud/helm/blob/3ad31c7461c4c3b58e0662ff6b4bdd1754dff7f2/charts/nextcloud/templates/deployment.yaml#L332-L344
Submitted PR here: https://github.com/nextcloud/helm/pull/379 (but that only would fix the group ownership, not the user ownership)
Current thoughts...
Perhaps since bitnami's postgres chart also provides an init container to get around this, we should just provide that as well since k3s and rancher are pretty popular, and it's not pretty, but I don't really see a way around this so far? (there is a beta rootless mode for k3s, but I haven't dove into that yet)
@tomasodehnal , thanks for poppping in to help (in fact, thank you to everyone who has tried to pop in and help with this weird issue grin ). I will take a peek at that. Few questions: Are you using k3s or another k8s on metal? Could you post your full PV/PVC manifests or section of your values.yaml with that info?
The reason I'm asking is that I'm wondering if it's actually a storage driver problem that has nothing to do with nextcloud? It only seems to be happening consistently for a few directories, and those seem to be mounts from persistent volumes.
@jessebot It's K3s on a Ubuntu VM on ESXi.
This is the manifest I use for the persistence.nextcloud volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud
labels:
type: local
spec:
storageClassName: nextcloud
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/nextcloud"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud
namespace: nextcloud
spec:
storageClassName: nextcloud
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
And the respective excerpt from the values.yaml:
nextcloud:
podSecurityContext:
runAsUser: 1003
runAsGroup: 1003
runAsNonRoot: true
fsGroup: 1003
persistence:
enabled: true
existingClaim: nextcloud
I was testing with fresh install without existing claims and I would say it works as expected:
- volume is created using the
local-pathprovider (as the default StorageClass was used) - nextcloud works and setup proceeds even when the owner is still
root:root, because the host paths have 777 bitmask (based on your k3s issues links I'm not sure if this is the expected current behavior of the provisioner, but that's how it worked here)
Looking into your manifest there is one thing I noticed. You say you use local-path provider, but I believe that might not be the case. The reason is that you are creating the PV on your own and the name local-path is then used as a referrer to existing PV by the claim, it is not the actual storage class. You can easily find out in annotations of the PVC - can you see volume.kubernetes.io/storage-provisioner: rancher.io/local-path?
The PV should be created by the provider, so you might remove the definition from the manifest and keep only PVC.
I think the issue lies in the storage provider one uses and is not related to nextcloud:
- when using 'manual' PVC with hostPath, we are on our own with the permissions as there is noone else to care about
- when using a provisioner that will create the PV for you based on the PVC settings, it might work, if the provisioner supports the permissions handling
If you want resolve it regardless of storage used, I would say init container is the safe bet, but it will need to have privileged permissions.
One other observation - fsGroup was not respected and used in my test as I'm on 1.25.3. Looks it might be supported only since 1.25.4 https://github.com/k3s-io/k3s/issues/6401.
Popping very quickly to say I tested this on GKE with kubernetes.io/gce-pd provisioner, and the same issue happens :( :
root@nextcloud-web-app-68f6bb8fb6-nblkq:/var/www/html# ls -hal
total 196K
drwxrwsr-x 15 www-data www-data 4.0K Apr 23 14:52 .
drwxrwsr-x 4 root 82 4.0K Apr 23 14:52 ..
-rw-r--r-- 1 www-data www-data 3.2K Apr 23 14:52 .htaccess
-rw-r--r-- 1 www-data www-data 101 Apr 23 14:52 .user.ini
drwxr-sr-x 45 www-data www-data 4.0K Apr 23 14:52 3rdparty
-rw-r--r-- 1 www-data www-data 19K Apr 23 14:52 AUTHORS
-rw-r--r-- 1 www-data www-data 34K Apr 23 14:52 COPYING
drwxr-sr-x 50 www-data www-data 4.0K Apr 23 14:52 apps
drwxrwsr-x 2 root 82 4.0K Apr 23 14:52 config
-rw-r--r-- 1 www-data www-data 4.0K Apr 23 14:52 console.php
drwxr-sr-x 24 www-data www-data 4.0K Apr 23 14:52 core
-rw-r--r-- 1 www-data www-data 6.2K Apr 23 14:52 cron.php
drwxrwsr-x 2 www-data www-data 4.0K Apr 23 14:52 custom_apps
drwxrwsr-x 2 www-data www-data 4.0K Apr 23 14:52 data
drwxr-sr-x 2 www-data www-data 12K Apr 23 14:52 dist
-rw-r--r-- 1 www-data www-data 156 Apr 23 14:52 index.html
-rw-r--r-- 1 www-data www-data 3.4K Apr 23 14:52 index.php
drwxr-sr-x 6 www-data www-data 4.0K Apr 23 14:52 lib
-rw-r--r-- 1 root 82 0 Apr 23 14:52 nextcloud-init-sync.lock
-rw-r----- 1 www-data www-data 14K Apr 23 14:54 nextcloud.log
-rwxr-xr-x 1 www-data www-data 283 Apr 23 14:52 occ
drwxr-sr-x 2 www-data www-data 4.0K Apr 23 14:52 ocm-provider
drwxr-sr-x 2 www-data www-data 4.0K Apr 23 14:52 ocs
drwxr-sr-x 2 www-data www-data 4.0K Apr 23 14:52 ocs-provider
-rw-r--r-- 1 www-data www-data 3.1K Apr 23 14:52 public.php
-rw-r--r-- 1 www-data www-data 5.5K Apr 23 14:52 remote.php
drwxr-sr-x 4 www-data www-data 4.0K Apr 23 14:52 resources
-rw-r--r-- 1 www-data www-data 26 Apr 23 14:52 robots.txt
-rw-r--r-- 1 www-data www-data 2.4K Apr 23 14:52 status.php
drwxrwsr-x 3 www-data www-data 4.0K Apr 23 14:52 themes
-rw-r--r-- 1 www-data www-data 384 Apr 23 14:52 version.php
I don't think this is specific to k3s anymore 🤔
@jessebot I think I ran into the same issue https://github.com/nextcloud/helm/issues/504 and I saw your perseverance tackling this... config folder is owned by root:root and thus the folder is empty. Were you able to find a fix for this issue?
i also stumbled onto this situation, where mounting a rancher.io/local-path PVC into k3s, results in the directory being owned as root. setting securityContext.fsGroup does change the directory group - just not the owner
i also observed the same behaviour with the kubernetes.io/aws-ebs provisioner on EKS. I am not sure if this is actually a bug, or if this is just working as expected? at least from these discussions, it seems like this is known behaviour:
- https://github.com/kubernetes/kubernetes/issues/2630#issuecomment-502205824
- https://github.com/kubernetes/kubernetes/issues/2630#issuecomment-1196837872
...
anyway, at least for my usecase, i was able to get a non-root nextcloud container running by setting the php config check_data_directory_permissions. i also got non-root nginx running by using image nginxinc/nginx-unprivileged:alpine.
below is a partial extract from my values.caml file. maybe this can help someone in the same boat?
image:
flavor: fpm
persistence:
enabled: true
existingClaim: nextcloud-pvc
nextcloud:
...
podSecurityContext:
runAsUser: 33
runAsGroup: 33
runAsNonRoot: true
readOnlyRootFilesystem: false
configs:
custom.config.php: |
<?php
$CONFIG = array(
'check_data_directory_permissions' => false, # https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/
);
nginx:
enabled: true
image:
repository: nginxinc/nginx-unprivileged
tag: alpine
pullPolicy: IfNotPresent
securityContext:
runAsUser: 101
runAsGroup: 101
runAsNonRoot: true
readOnlyRootFilesystem: false
...
PVC definition:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-pvc
namespace: nextcloud
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Mi
storageClassName: local-path
volumeMode: Filesystem