cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

elastic-internal-init-filesystem does not prepare the data directory

Open dobharweim opened this issue 2 years ago • 6 comments

Bug Report

What did you do?

I Installed the quickstart elasticsearch cluster from the docs to an namespace managed by Operator version 2.5.0.

What did you expect to see?

I expected the elasticsearch pod to start with an initContainer elastic-internal-init-filesystem which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.

What did you see instead? Under which circumstances?

Instead the elastic-internal-init-filesystem container does not update the volume mount and therefore it is unwritable. ES fails with the following error (logs from elastic-internal-init-filesystem below):

k logs quickstart-es-default-0

{"@timestamp":"2022-11-09T12:37:41.381Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"quickstart-es-default-0","elasticsearch.cluster.name":"quickstart","error.type":"java.lang.IllegalStateException","error.message":"failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?","error.stack_trace":"java.lang.IllegalStateException: failed to obtain node locks, tried [/usr/share/elasticsearch/data]; maybe these locations are not writable or multiple nodes were started on the same data path?\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:285)\n\tat [email protected]/org.elasticsearch.node.Node.(Node.java:469)\n\tat [email protected]/org.elasticsearch.node.Node.(Node.java:316)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch$2.(Elasticsearch.java:214)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214)\n\tat [email protected]/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.io.IOException: failed to obtain lock on /usr/share/elasticsearch/data\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:230)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:198)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment.(NodeEnvironment.java:277)\n\t... 5 more\nCaused by: java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/node.lock\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:825)\n\tat [email protected]/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:94)\n\tat [email protected]/org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:43)\n\tat [email protected]/org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:44)\n\tat [email protected]/org.elasticsearch.env.NodeEnvironment$NodeLock.(NodeEnvironment.java:223)\n\t... 7 more\n\tSuppressed: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/node.lock\n\t\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)\n\t\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\t\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)\n\t\tat java.base/java.nio.file.Files.newByteChannel(Files.java:380)\n\t\tat java.base/java.nio.file.Files.createFile(Files.java:658)\n\t\tat [email protected]/org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:84)\n\t\t... 10 more\n"} ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/quickstart.log {"timestamp": "2022-11-09T12:37:41+00:00", "message": "readiness probe failed", "curl_rc": "7"}

ERROR: Elasticsearch exited unexpectedly

k logs quickstart-es-default-0 elastic-internal-init-filesystem

Starting init script Linking /mnt/elastic-internal/xpack-file-realm/users to /usr/share/elasticsearch/config/users Linking /mnt/elastic-internal/xpack-file-realm/roles.yml to /usr/share/elasticsearch/config/roles.yml Linking /mnt/elastic-internal/xpack-file-realm/users_roles to /usr/share/elasticsearch/config/users_roles Linking /mnt/elastic-internal/elasticsearch-config/elasticsearch.yml to /usr/share/elasticsearch/config/elasticsearch.yml Linking /mnt/elastic-internal/unicast-hosts/unicast_hosts.txt to /usr/share/elasticsearch/config/unicast_hosts.txt Linking /mnt/elastic-internal/xpack-file-realm/service_tokens to /usr/share/elasticsearch/config/service_tokens File linking duration: 0 sec. Copying /usr/share/elasticsearch/config/* to /mnt/elastic-internal/elasticsearch-config-local/ '/usr/share/elasticsearch/config/elasticsearch-plugins.example.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch-plugins.example.yml' '/usr/share/elasticsearch/config/elasticsearch.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml' '/usr/share/elasticsearch/config/http-certs' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/ca.crt' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/tls.crt' '/usr/share/elasticsearch/config/http-certs/..2022_11_09_12_24_31.1370307674/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..2022_11_09_12_24_31.1370307674/tls.key' '/usr/share/elasticsearch/config/http-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/..data' '/usr/share/elasticsearch/config/http-certs/tls.key' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.key' '/usr/share/elasticsearch/config/http-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/ca.crt' '/usr/share/elasticsearch/config/http-certs/tls.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/http-certs/tls.crt' '/usr/share/elasticsearch/config/jvm.options' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options' '/usr/share/elasticsearch/config/jvm.options.d' -> '/mnt/elastic-internal/elasticsearch-config-local/jvm.options.d' '/usr/share/elasticsearch/config/log4j2.file.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.file.properties' '/usr/share/elasticsearch/config/log4j2.properties' -> '/mnt/elastic-internal/elasticsearch-config-local/log4j2.properties' '/usr/share/elasticsearch/config/role_mapping.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/role_mapping.yml' '/usr/share/elasticsearch/config/roles.yml' -> '/mnt/elastic-internal/elasticsearch-config-local/roles.yml' '/usr/share/elasticsearch/config/service_tokens' -> '/mnt/elastic-internal/elasticsearch-config-local/service_tokens' '/usr/share/elasticsearch/config/transport-remote-certs' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs' '/usr/share/elasticsearch/config/transport-remote-certs/..2022_11_09_12_24_31.2343229866' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2022_11_09_12_24_31.2343229866' '/usr/share/elasticsearch/config/transport-remote-certs/..2022_11_09_12_24_31.2343229866/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..2022_11_09_12_24_31.2343229866/ca.crt' '/usr/share/elasticsearch/config/transport-remote-certs/..data' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/..data' '/usr/share/elasticsearch/config/transport-remote-certs/ca.crt' -> '/mnt/elastic-internal/elasticsearch-config-local/transport-remote-certs/ca.crt' '/usr/share/elasticsearch/config/unicast_hosts.txt' -> '/mnt/elastic-internal/elasticsearch-config-local/unicast_hosts.txt' '/usr/share/elasticsearch/config/users' -> '/mnt/elastic-internal/elasticsearch-config-local/users' '/usr/share/elasticsearch/config/users_roles' -> '/mnt/elastic-internal/elasticsearch-config-local/users_roles' Empty dir /usr/share/elasticsearch/plugins Copying /usr/share/elasticsearch/bin/* to /mnt/elastic-internal/elasticsearch-bin-local/ '/usr/share/elasticsearch/bin/elasticsearch' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch' '/usr/share/elasticsearch/bin/elasticsearch-certgen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certgen' '/usr/share/elasticsearch/bin/elasticsearch-certutil' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-certutil' '/usr/share/elasticsearch/bin/elasticsearch-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-cli' '/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-create-enrollment-token' '/usr/share/elasticsearch/bin/elasticsearch-croneval' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-croneval' '/usr/share/elasticsearch/bin/elasticsearch-env' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env' '/usr/share/elasticsearch/bin/elasticsearch-env-from-file' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-env-from-file' '/usr/share/elasticsearch/bin/elasticsearch-geoip' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-geoip' '/usr/share/elasticsearch/bin/elasticsearch-keystore' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-keystore' '/usr/share/elasticsearch/bin/elasticsearch-node' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-node' '/usr/share/elasticsearch/bin/elasticsearch-plugin' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-plugin' '/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-reconfigure-node' '/usr/share/elasticsearch/bin/elasticsearch-reset-password' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-reset-password' '/usr/share/elasticsearch/bin/elasticsearch-saml-metadata' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-saml-metadata' '/usr/share/elasticsearch/bin/elasticsearch-service-tokens' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-service-tokens' '/usr/share/elasticsearch/bin/elasticsearch-setup-passwords' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-setup-passwords' '/usr/share/elasticsearch/bin/elasticsearch-shard' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-shard' '/usr/share/elasticsearch/bin/elasticsearch-sql-cli' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli' '/usr/share/elasticsearch/bin/elasticsearch-sql-cli-8.5.0.jar' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-sql-cli-8.5.0.jar' '/usr/share/elasticsearch/bin/elasticsearch-syskeygen' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-syskeygen' '/usr/share/elasticsearch/bin/elasticsearch-users' -> '/mnt/elastic-internal/elasticsearch-bin-local/elasticsearch-users' Files copy duration: 0 sec. chown duration: 0 sec. waiting for the transport certificates (/mnt/elastic-internal/transport-certificates/quickstart-es-default-0.tls.key) wait duration: 1 sec. Linking /usr/share/elasticsearch/config/transport-certs/quickstart-es-default-0.tls.crt to /mnt/elastic-internal/elasticsearch-config-local/node-transport-cert/transport.tls.crt Linking /usr/share/elasticsearch/config/transport-certs/quickstart-es-default-0.tls.crt to /mnt/elastic-internal/elasticsearch-config-local/node-transport-cert/transport.tls.crt Certs linking duration: 0 sec. Init script successful Script duration: 1 sec.

  • ECK version:

    2.5.0

  • Kubernetes information:

    IBM Kubernetes Service.

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+IKS", GitCommit:"16b9651762237ff35f832b596fde9dd428d8150d", GitTreeState:"clean", BuildDate:"2022-10-14T06:25:49Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

  • Resource definition:
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 8.5.0
  nodeSets:
  - name: default
    count: 1
    config:
      node.store.allow_mmap: false
EOF

dobharweim avatar Nov 09 '22 12:11 dobharweim

I've gotten a workaround for this issue. I add a securityContext to run the initContainer as root and it seems to detect this and run the chown step.

New Manifest

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 8.5.0
  nodeSets:
  - name: default
    count: 1
    config:
      node.store.allow_mmap: false
    podTemplate:
      spec:
        securityContext:
          fsGroup: 1000
          runAsUser: 1000
          runAsGroup: 0
        initContainers:
        - name: elastic-internal-init-filesystem
          securityContext:
            runAsUser: 0
            runAsGroup: 0
EOF

dobharweim avatar Nov 09 '22 22:11 dobharweim

The set-default-security-context ECK parameter, which defaults to true, is responsible for automatically adding fsGroup: 1000 to the elasticsearch pod's securityContext, in order to make Kubernetes automatically change ownership on the data volume (see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods)

Can you double check the value you are using?

Next, DelegateFSGroupToCSIDriver is a K8s feature-gate which delegates the ownership change to the CSI driver. It was alpha / false up to Kubernetes 1.22 and is now beta / true since Kubernetes 1.23. You should validate your CSI driver doesn't have any known issue regarding this feature (some has from my personal experience). Using Kubernetes 1.23+, you can still force this feature-gate to false on the various Kubernetes components

jeanfabrice avatar Nov 16 '22 21:11 jeanfabrice

Hi @jeanfabrice my apologies for the delay in responding, thank you for your input and direction.

ECK is installed with chart default settings as I should have outlined in the original issue.

DelegateFSGroupToCSIDriver is enabled on my cluster, I don't know of any known issues with the feature with my provider. Do you have any info on the usual types of issues/what I could search for or try to reproduce in this area? Thanks.

dobharweim avatar Dec 05 '22 20:12 dobharweim

Hey @dobharweim! I would first check whether set-default-security-context is enabled or not from an ECK perspective. If it is, your elasticsearch pods should normally have the securityContext.fsGroup: 1000 automatically configured.

To determine whether or not your CSI driver has an issue with DelegateFSGroupToCSIDriver, you can certainly spin a busybox pod with securityContext.fsGroup: 1000 plus a mounted PVC, then see whether the PVC content is getting updated with group: 1000 ownership or not. If it is not, then the delegation is at fault. If it is, then it should work the same with elasticsearch pods.

jeanfabrice avatar Dec 06 '22 17:12 jeanfabrice

I expected the elasticsearch pod to start with an initContainer elastic-internal-init-filesystem which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.

Setting permissions requires the init container to run as root, which is not the case by default. As stated by the K8S documentation setting a fsGroup in the Pod securityContext should set the expected permissions without running a container with runAsGroup: 0:

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.

For example:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  # uncomment the lines below to copy the specified node labels as pod annotations and use it as an environment variable in the Pods
  #annotations:
  #  eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
  name: elasticsearch-sample
spec:
  version: 8.5.0
  nodeSets:
  - name: default
    config:
      node.store.allow_mmap: false
    podTemplate:
      spec:
        securityContext:
          runAsUser: 3000
          runAsGroup: 0
          fsGroup: 3000

The set-default-security-context ECK parameter, which defaults to true, is responsible for automatically adding fsGroup: 1000 to the elasticsearch pod's securityContext,

Good point, but I think the doc is not up-to-date and the default value is auto-detect (detection mechanism here) since 2.5.0 (see https://github.com/elastic/cloud-on-k8s/pull/5150/files) I have no idea how it behaves on IBM Kubernetes Service? Is it a "flavor" of OpenShift?

barkbay avatar Jan 18 '23 12:01 barkbay

Exact same issue with a local minikube cluster using ECK version 2.11.1, can be easily reproduced through a PersistentVolume as follows:

  • pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv-1
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/manual-pv-1"
  • elasticsearch,yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 8.12.0
  nodeSets:
    - name: default
      count: 1
      podTemplate:
        spec:
          # Uncomment to fix the issue
          #
          # securityContext:
          #   fsGroup: 1000
          #   runAsUser: 1000
          #   runAsGroup: 0
          # initContainers:
          #   - name: elastic-internal-init-filesystem
          #     securityContext:
          #       runAsUser: 0
          #       runAsGroup: 0
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 2Gi
                  cpu: 2
                limits:
                  memory: 4Gi
                  cpu: 8
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
            storageClassName: manual

usersina avatar Feb 13 '24 17:02 usersina