cloud-on-k8s
cloud-on-k8s copied to clipboard
elastic-internal-init-filesystem does not prepare the data directory
Bug Report
What did you do?
I Installed the quickstart elasticsearch cluster from the docs to an namespace managed by Operator version 2.5.0.
What did you expect to see?
I expected the elasticsearch pod to start with an initContainer elastic-internal-init-filesystem
which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.
What did you see instead? Under which circumstances?
Instead the elastic-internal-init-filesystem
container does not update the volume mount and therefore it is unwritable. ES fails with the following error (logs from elastic-internal-init-filesystem
below):
k logs quickstart-es-default-0
{"@timestamp":"2022-11-09T12:37:41.381Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"quickstart-es-default-0","elasticsearch.cluster.name":"quickstart","error.type":"java.lang.IllegalStateException","error.message":"failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?","error.stack_trace":"java.lang.IllegalStateException: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment.
(NodeEnvironment.java:285)\n\tat [email protected]/org.elasticsearch.node.Node. (Node.java:469)\n\tat [email protected]/org.elasticsearch.node.Node. (Node.java:316)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch$2. (Elasticsearch.java:214)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock. (NodeEnvironment.java:230)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock. (NodeEnvironment.java:198)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment. (NodeEnvironment.java:277)\n\t... 5 more\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/node.lock\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:825)\n\tat [email protected]/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:94)\n\tat [email protected]/org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\tat [email protected]/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock. (NodeEnvironment.java:223)\n\t... 7 more\n\tSuppressed: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/node.lock\n\t\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\t\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)\n\t\tat java.base/java.nio.file.Files.newByteChannel(Files.java:380)\n\t\tat java.base/java.nio.file.Files.createFile(Files.java:658)\n\t\tat [email protected]/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:84)\n\t\t... 10 more\n"} ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/quickstart.log {"timestamp": "2022-11-09T12:37:41+00:00", "message": "readiness probe failed", "curl_rc": "7"} ERROR: Elasticsearch exited unexpectedly
k logs quickstart-es-default-0 elastic-internal-init-filesystem
Starting init script Linking /mnt/elastic-internal/xpack-file-realm/users to /usr/share/elasticsearch/config/users Linking /mnt/elastic-internal/xpack-file-realm/roles.yml to /usr/share/elasticsearch/config/roles.yml Linking /mnt/elastic-internal/xpack-file-realm/users_roles to /usr/share/elasticsearch/config/users_roles Linking /mnt/elastic-internal/elasticsearch-config/elasticsearch.yml to /usr/share/elasticsearch/config/elasticsearch.yml Linking /mnt/elastic-internal/unicast-hosts/unicast_hosts.txt to /usr/share/elasticsearch/config/unicast_hosts.txt Linking /mnt/elastic-internal/xpack-file-realm/service_tokens to /usr/share/elasticsearch/config/service_tokens File linking duration: 0 sec. Copying /usr/share/elasticsearch/config/* to /mnt/elastic-internal/elasticsearch-config-local/ '/usr/share/elasticsearch/config/elasticsearch-plugins.example.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch-plugins.example.yml' '/usr/share/elasticsearch/config/elasticsearch.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml' '/usr/share/elasticsearch/config/http-certs' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/ca.crt' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/tls.crt' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/tls.key' '/usr/share/elasticsearch/config/http-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data' '/usr/share/elasticsearch/config/http-certs/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key' '/usr/share/elasticsearch/config/http-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt' '/usr/share/elasticsearch/config/http-certs/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt' '/usr/share/elasticsearch/config/jvm.options' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options' '/usr/share/elasticsearch/config/jvm.options.d' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options.d' '/usr/share/elasticsearch/config/log4j2.file.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.file.properties' '/usr/share/elasticsearch/config/log4j2.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.properties' '/usr/share/elasticsearch/config/role_mapping.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/role_mapping.yml' '/usr/share/elasticsearch/config/roles.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/roles.yml' '/usr/share/elasticsearch/config/service_tokens' -> '/mnt/elastic-internal/elasticsearch-config-local/service_tokens' '/usr/share/elasticsearch/config/transport-remote-certs' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs' '/usr/share/elasticsearch/config/transport-remote-certs/..2022_11_09_12_24_31.2343229866' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2022_11_09_12_24_31.2343229866' '/usr/share/elasticsearch/config/transport-remote-certs/..2022_11_09_12_24_31.2343229866/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2022_11_09_12_24_31.2343229866/ca.crt' '/usr/share/elasticsearch/config/transport-remote-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data' '/usr/share/elasticsearch/config/transport-remote-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt' '/usr/share/elasticsearch/config/unicast_hosts.txt' -> '/mnt/elastic-internal/elasticsearch-config-local/unicast_hosts.txt' '/usr/share/elasticsearch/config/users' -> '/mnt/elastic-internal/elasticsearch-config-local/users' '/usr/share/elasticsearch/config/users_roles' -> '/mnt/elastic-internal/elasticsearch-config-local/users_roles' Empty dir /usr/share/elasticsearch/plugins Copying /usr/share/elasticsearch/bin/* to /mnt/elastic-internal/elasticsearch-bin-local/ '/usr/share/elasticsearch/bin/elasticsearch' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch' '/usr/share/elasticsearch/bin/elasticsearch-certgen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certgen' '/usr/share/elasticsearch/bin/elasticsearch-certutil' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certutil' '/usr/share/elasticsearch/bin/elasticsearch-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-cli' '/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-create-enrollment-token' '/usr/share/elasticsearch/bin/elasticsearch-croneval' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-croneval' '/usr/share/elasticsearch/bin/elasticsearch-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env' '/usr/share/elasticsearch/bin/elasticsearch-env-from-file' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env-from-file' '/usr/share/elasticsearch/bin/elasticsearch-geoip' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-geoip' '/usr/share/elasticsearch/bin/elasticsearch-keystore' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-keystore' '/usr/share/elasticsearch/bin/elasticsearch-node' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-node' '/usr/share/elasticsearch/bin/elasticsearch-plugin' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-plugin' '/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-reconfigure-node' '/usr/share/elasticsearch/bin/elasticsearch-reset-password' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-reset-password' '/usr/share/elasticsearch/bin/elasticsearch-saml-metadata' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-saml-metadata' '/usr/share/elasticsearch/bin/elasticsearch-service-tokens' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-service-tokens' '/usr/share/elasticsearch/bin/elasticsearch-setup-passwords' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-setup-passwords' '/usr/share/elasticsearch/bin/elasticsearch-shard' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-shard' '/usr/share/elasticsearch/bin/elasticsearch-sql-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli' '/usr/share/elasticsearch/bin/elasticsearch-sql-cli-8.5.0.jar' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli-8.5.0.jar' '/usr/share/elasticsearch/bin/elasticsearch-syskeygen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-syskeygen' '/usr/share/elasticsearch/bin/elasticsearch-users' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-users' Files copy duration: 0 sec. chown duration: 0 sec. waiting for the transport certificates (/mnt/elastic-internal/transport-certificates/quickstart-es-default-0.tls.key) wait duration: 1 sec. Linking /usr/share/elasticsearch/config/transport-certs/quickstart-es-default-0.tls.crt to /mnt/elastic-internal/elasticsearch-config-local/node-transport-cert/transport.tls.crt Linking /usr/share/elasticsearch/config/transport-certs/quickstart-es-default-0.tls.crt to /mnt/elastic-internal/elasticsearch-config-local/node-transport-cert/transport.tls.crt Certs linking duration: 0 sec. Init script successful Script duration: 1 sec.
-
ECK version:
2.5.0
-
Kubernetes information:
IBM Kubernetes Service.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+IKS", GitCommit:"16b9651762237ff35f832b596fde9dd428d8150d", GitTreeState:"clean", BuildDate:"2022-10-14T06:25:49Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
- Resource definition:
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.5.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
EOF
I've gotten a workaround for this issue. I add a securityContext to run the initContainer as root and it seems to detect this and run the chown step.
New Manifest
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.5.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
podTemplate:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
EOF
The set-default-security-context
ECK parameter, which defaults to true
, is responsible for automatically adding fsGroup: 1000
to the elasticsearch pod's securityContext
, in order to make Kubernetes automatically change ownership on the data volume (see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods)
Can you double check the value you are using?
Next, DelegateFSGroupToCSIDriver
is a K8s feature-gate which delegates the ownership change to the CSI driver. It was alpha / false
up to Kubernetes 1.22 and is now beta / true
since Kubernetes 1.23. You should validate your CSI driver doesn't have any known issue regarding this feature (some has from my personal experience). Using Kubernetes 1.23+, you can still force this feature-gate to false on the various Kubernetes components
Hi @jeanfabrice my apologies for the delay in responding, thank you for your input and direction.
ECK is installed with chart default settings as I should have outlined in the original issue.
DelegateFSGroupToCSIDriver
is enabled on my cluster, I don't know of any known issues with the feature with my provider. Do you have any info on the usual types of issues/what I could search for or try to reproduce in this area? Thanks.
Hey @dobharweim!
I would first check whether set-default-security-context
is enabled or not from an ECK perspective. If it is, your elasticsearch pods should normally have the securityContext.fsGroup: 1000
automatically configured.
To determine whether or not your CSI driver has an issue with DelegateFSGroupToCSIDriver
, you can certainly spin a busybox pod with securityContext.fsGroup: 1000
plus a mounted PVC, then see whether the PVC content is getting updated with group: 1000
ownership or not. If it is not, then the delegation is at fault. If it is, then it should work the same with elasticsearch pods.
I expected the elasticsearch pod to start with an initContainer elastic-internal-init-filesystem which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.
Setting permissions requires the init container to run as root
, which is not the case by default. As stated by the K8S documentation setting a fsGroup
in the Pod securityContext
should set the expected permissions without running a container with runAsGroup: 0
:
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.
For example:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
# uncomment the lines below to copy the specified node labels as pod annotations and use it as an environment variable in the Pods
#annotations:
# eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
name: elasticsearch-sample
spec:
version: 8.5.0
nodeSets:
- name: default
config:
node.store.allow_mmap: false
podTemplate:
spec:
securityContext:
runAsUser: 3000
runAsGroup: 0
fsGroup: 3000
The set-default-security-context ECK parameter, which defaults to true, is responsible for automatically adding fsGroup: 1000 to the elasticsearch pod's securityContext,
Good point, but I think the doc is not up-to-date and the default value is auto-detect
(detection mechanism here) since 2.5.0 (see https://github.com/elastic/cloud-on-k8s/pull/5150/files)
I have no idea how it behaves on IBM Kubernetes Service? Is it a "flavor" of OpenShift?
Exact same issue with a local minikube cluster using ECK version 2.11.1
, can be easily reproduced through a PersistentVolume
as follows:
-
pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: manual-pv-1
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/manual-pv-1"
-
elasticsearch,yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.12.0
nodeSets:
- name: default
count: 1
podTemplate:
spec:
# Uncomment to fix the issue
#
# securityContext:
# fsGroup: 1000
# runAsUser: 1000
# runAsGroup: 0
# initContainers:
# - name: elastic-internal-init-filesystem
# securityContext:
# runAsUser: 0
# runAsGroup: 0
containers:
- name: elasticsearch
resources:
requests:
memory: 2Gi
cpu: 2
limits:
memory: 4Gi
cpu: 8
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: manual