cockroach-operator
cockroach-operator copied to clipboard
Permission denied creating the data directory
I'm stuck with the following error when trying to create any kind of CockroachDB cluster using the operator:
E240215 20:18:40.885312 1 1@cli/clierror/check.go:35 [-] 1 ERROR: connection lost.
E240215 20:18:40.885312 1 1@cli/clierror/check.go:35 [-] 1 +creating data directory: mkdir /cockroach/cockroach-data/auxiliary: permission denied
ERROR: connection lost.
creating data directory: mkdir /cockroach/cockroach-data/auxiliary: permission denied
Failed running "start"
The cluster manifest might look like this:
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: primary-crdb
spec:
cockroachDBVersion: v23.1.11
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
storageClassName: primary-nfs
volumeMode: Filesystem
nodes: 3
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
tlsEnabled: true
The storage class is for csi-driver-nfs
and leads to the following directory tree:
$ ls -lahF /<nfs-csi-dir>/*
/<nfs-csi-dir>/pvc-40b518b5-bccc-4610-b804-0bd2175f5eed:
total 18K
drwxrwsr-x 2 root 1000581000 2 Feb 11 16:26 ./
drwxr-xr-x 6 root root 6 Feb 11 17:07 ../
The CockroachDB pod manifest (kubectl get pods primary-crdb-0 --output yaml
) has the following security context:
securityContext:
fsGroup: 1000581000
runAsUser: 1000581000
Which explains why the permissions actually don't add up.
For comparison, using this storage setup, it is possible to create a working mount like this:
...
containers:
- name: busybox
image: busybox:1.28
command: [ "sh", "-c", "sleep 1h" ]
volumeMounts:
- name: data
mountPath: "/test"
securityContext:
runAsUser: 2000
runAsGroup: 2000
fsGroup: 2000
volumes:
- name: data
persistentVolumeClaim:
claimName: test
When creating a file (touch /test/file
) from inside the container the directory tree looks like this:
$ ls -lahF /<nfs-csi-dir>/*
/<nfs-csi-dir>/pvc-730e175e-af46-4e48-b4e4-5a1dd568307d:
total 19K
drwxrwsr-x 2 root 2000 3 Feb 11 17:14 ./
drwxr-xr-x 6 root root 6 Feb 11 17:07 ../
-rw-rw-r-- 1 2000 2000 0 Feb 11 17:14 file
It works because all owner and group match.
I'm wondering if the operator should specify runAsGroup
or if there is something unusual with my setup, and if this should not be necessary at all.
The locations in the code would be the following:
- https://github.com/cockroachdb/cockroach-operator/blob/v2.12.0/pkg/resource/statefulset.go#L208
- https://github.com/cockroachdb/cockroach-operator/blob/v2.12.0/pkg/resource/job.go#L95
Even though I don't have much experience in self-hosting storage for Kubernetes, I would say adding runAsGroup
is the right idea and I'm happy to create a PR if wanted.