Shuffle
Shuffle copied to clipboard
GKE/EKS: OpenSearch on hosted Container Services
Describe the bug We tried to install shuffle in lab environment. The backend/frontend/orborus have all started, except opensearch
To Reproduce
opensearch:
image: opensearchproject/opensearch:1.2.1
hostname: shuffle-opensearch
container_name: shuffle-opensearch
..
...
** Debug logs (NOT APPLICABLE FOR CLOUD)** Run the following commands and paste them
Enabling OpenSearch Security Plugin
[2021-12-20T16:04:24,236][WARN ][o.o.b.JNANatives ] [shuffle-opensearch] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
[2021-12-20T16:04:24,238][WARN ][o.o.b.JNANatives ] [shuffle-opensearch] This can result in part of the JVM being swapped out.
[2021-12-20T16:04:24,238][WARN ][o.o.b.JNANatives ] [shuffle-opensearch] Increase RLIMIT_MEMLOCK, soft limit: 16777216, hard limit: 16777216
[2021-12-20T16:04:24,239][WARN ][o.o.b.JNANatives ] [shuffle-opensearch] These can be adjusted by modifying /etc/security/limits.conf, for example:
# allow user 'opensearch' mlockall
opensearch soft memlock unlimited
opensearch hard memlock unlimited
[2021-12-20T16:04:24,239][WARN ][o.o.b.JNANatives ] [shuffle-opensearch] If you are logged in interactively, you will have to re-login for the new limits to take effect.
[2021-12-20T16:04:24,406][INFO ][o.o.n.Node ] [shuffle-opensearch] version[1.2.1], pid[103], build[tar/e3a44fa71b290fb265a94ef4297f044b9a63a762/2021-12-11T04:22:52.398139Z], OS[Linux/5.4.120+/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/15.0.1/15.0.1+9]
...
[2021-12-20T14:03:27,246][INFO ][o.o.p.h.c.PerformanceAnalyzerConfigAction] [shuffle-opensearch] PerformanceAnalyzer Enabled: false
[2021-12-20T14:03:27,299][INFO ][o.o.n.Node ] [shuffle-opensearch] initialized
[2021-12-20T14:03:27,300][INFO ][o.o.n.Node ] [shuffle-opensearch] starting ...
[2021-12-20T14:03:27,407][INFO ][o.o.t.TransportService ] [shuffle-opensearch] publish_address {10.32.0.203:9300}, bound_addresses {0.0.0.0:9300}
[2021-12-20T14:03:27,548][INFO ][o.o.b.BootstrapChecks ] [shuffle-opensearch] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: memory locking requested for opensearch process but memory is not locked
ERROR: OpenSearch did not exit normally - check the logs at /usr/share/opensearch/logs/shuffle-cluster.log
[2021-12-20T14:03:27,554][INFO ][o.o.n.Node ] [shuffle-opensearch] stopping ...
[2021-12-20T14:03:27,565][INFO ][o.o.n.Node ] [shuffle-opensearch] stopped
[2021-12-20T14:03:27,565][INFO ][o.o.n.Node ] [shuffle-opensearch] closing ...
[2021-12-20T14:03:27,573][INFO ][o.o.n.Node ] [shuffle-opensearch] closed
Killing performance analyzer process 127
OpenSearch exited with code 78
Performance analyzer exited with code 143
While Searching for OpenSearch exited with code 78
error, it says for ElasticSearch ; it is related to vm.max_map_count
being small.
We tried to login to the "container pod" itself and tried adding sysctl -w vm.max_map_count=262144
as root, but sysctl
utility is NOT part of the opensearch image it seems !
PS: NO logs are produced in: /usr/share/opensearch/logs/shuffle-cluster.log, though the error says it will be
Setup bootstrap.memory_lock: 'false'
Setup bootstrap.memory_lock: 'false'
What he said - that as an environment variable is a workaround! Either that, or set vm.max_map_count=262144 on the node (k8s) / host, and NOT within the container itself. To make it persist through restarts, set it in the /etc/sysctl.conf file as well
We recommend the latter for production environments
set vm.max_map_count=262144 sometimes doesn't work if you got multiple instance of elasticsearch/opensearch on the host
Setup bootstrap.memory_lock: 'false'
What he said - that as an environment variable is a workaround! Either that, or set vm.max_map_count=262144 on the node (k8s) / host, and NOT within the container itself. To make it persist through restarts, set it in the /etc/sysctl.conf file as well
We recommend the latter for production environments
Setting vm.max_map_count=262144
at node/host level didn't work. Will try out the bootstrap.memory_lock: 'false'
next
Thanks, the setting to 'false' worked. But now the error is permission of shuffle-database
permissions as 1000:1000
How to do permission changes in Kubernetes (i.e. systems without Filesystem access?). Any chance to inject it as env value in yaml or docker-composer file?
Thanks, the setting to 'false' worked. But now the error is permission of
shuffle-database
permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)
That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored?
The base example is indeed to mount in a folder, and giving access that way.
Thanks, the setting to 'false' worked. But now the error is permission of
shuffle-database
permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored? Yes, it is run on EKS/GKE etc. It is stored as per the volumeMounts specified
${DB_LOCATION}:/usr/share/opensearch/data
Is it better to do as an "initContainers" or chance to update securityContext
?
Thanks, the setting to 'false' worked. But now the error is permission of
shuffle-database
permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored? Yes, it is run on EKS/GKE etc. It is stored as per the volumeMounts specified
${DB_LOCATION}:/usr/share/opensearch/data
Is it better to do as an "initContainers" or chance to update
securityContext
?
I think you've reached the limit of my K8s knowledge - especially since this is in a hosted environment. I think this blogpost may be of a lot of help with storage utilities: https://medium.com/google-cloud/a-guide-to-deploy-elasticsearch-cluster-on-google-kubernetes-engine-52f67743ee98.
Keep in mind that Opensearch and Elasticsearch are VERY close to the same software, meaning you can search for elasticsearch to find solutions
@dhaval055 - please check if this is a clearer possibility now, and how we may deploy to eks and similar.
@dhaval055:
Please check out this for Opensearch as well:
- Azure Container Instances
@dhaval055 input?
@dhaval055 input?
Got OpenSearch running on Azure Container Instances. haven't gotten to test on GKE/EKS