Shuffle icon indicating copy to clipboard operation
Shuffle copied to clipboard

GKE/EKS: OpenSearch on hosted Container Services

Open getkub opened this issue 3 years ago • 12 comments

Describe the bug We tried to install shuffle in lab environment. The backend/frontend/orborus have all started, except opensearch

To Reproduce

  opensearch:
    image: opensearchproject/opensearch:1.2.1
    hostname: shuffle-opensearch
    container_name: shuffle-opensearch
..

...

** Debug logs (NOT APPLICABLE FOR CLOUD)** Run the following commands and paste them

Enabling OpenSearch Security Plugin
[2021-12-20T16:04:24,236][WARN ][o.o.b.JNANatives         ] [shuffle-opensearch] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
[2021-12-20T16:04:24,238][WARN ][o.o.b.JNANatives         ] [shuffle-opensearch] This can result in part of the JVM being swapped out.
[2021-12-20T16:04:24,238][WARN ][o.o.b.JNANatives         ] [shuffle-opensearch] Increase RLIMIT_MEMLOCK, soft limit: 16777216, hard limit: 16777216
[2021-12-20T16:04:24,239][WARN ][o.o.b.JNANatives         ] [shuffle-opensearch] These can be adjusted by modifying /etc/security/limits.conf, for example:
        # allow user 'opensearch' mlockall
        opensearch soft memlock unlimited
        opensearch hard memlock unlimited
[2021-12-20T16:04:24,239][WARN ][o.o.b.JNANatives         ] [shuffle-opensearch] If you are logged in interactively, you will have to re-login for the new limits to take effect.
[2021-12-20T16:04:24,406][INFO ][o.o.n.Node               ] [shuffle-opensearch] version[1.2.1], pid[103], build[tar/e3a44fa71b290fb265a94ef4297f044b9a63a762/2021-12-11T04:22:52.398139Z], OS[Linux/5.4.120+/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/15.0.1/15.0.1+9]
...

[2021-12-20T14:03:27,246][INFO ][o.o.p.h.c.PerformanceAnalyzerConfigAction] [shuffle-opensearch] PerformanceAnalyzer Enabled: false
[2021-12-20T14:03:27,299][INFO ][o.o.n.Node               ] [shuffle-opensearch] initialized
[2021-12-20T14:03:27,300][INFO ][o.o.n.Node               ] [shuffle-opensearch] starting ...
[2021-12-20T14:03:27,407][INFO ][o.o.t.TransportService   ] [shuffle-opensearch] publish_address {10.32.0.203:9300}, bound_addresses {0.0.0.0:9300}
[2021-12-20T14:03:27,548][INFO ][o.o.b.BootstrapChecks    ] [shuffle-opensearch] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: memory locking requested for opensearch process but memory is not locked
ERROR: OpenSearch did not exit normally - check the logs at /usr/share/opensearch/logs/shuffle-cluster.log
[2021-12-20T14:03:27,554][INFO ][o.o.n.Node               ] [shuffle-opensearch] stopping ...
[2021-12-20T14:03:27,565][INFO ][o.o.n.Node               ] [shuffle-opensearch] stopped
[2021-12-20T14:03:27,565][INFO ][o.o.n.Node               ] [shuffle-opensearch] closing ...
[2021-12-20T14:03:27,573][INFO ][o.o.n.Node               ] [shuffle-opensearch] closed
Killing performance analyzer process 127
OpenSearch exited with code 78
Performance analyzer exited with code 143

While Searching for OpenSearch exited with code 78 error, it says for ElasticSearch ; it is related to vm.max_map_count being small. We tried to login to the "container pod" itself and tried adding sysctl -w vm.max_map_count=262144 as root, but sysctl utility is NOT part of the opensearch image it seems !

PS: NO logs are produced in: /usr/share/opensearch/logs/shuffle-cluster.log, though the error says it will be

getkub avatar Dec 20 '21 14:12 getkub

Setup bootstrap.memory_lock: 'false'

azgaviperr avatar Dec 20 '21 16:12 azgaviperr

Setup bootstrap.memory_lock: 'false'

What he said - that as an environment variable is a workaround! Either that, or set vm.max_map_count=262144 on the node (k8s) / host, and NOT within the container itself. To make it persist through restarts, set it in the /etc/sysctl.conf file as well

We recommend the latter for production environments

frikky avatar Dec 20 '21 16:12 frikky

set vm.max_map_count=262144 sometimes doesn't work if you got multiple instance of elasticsearch/opensearch on the host

azgaviperr avatar Dec 20 '21 16:12 azgaviperr

Setup bootstrap.memory_lock: 'false'

What he said - that as an environment variable is a workaround! Either that, or set vm.max_map_count=262144 on the node (k8s) / host, and NOT within the container itself. To make it persist through restarts, set it in the /etc/sysctl.conf file as well

We recommend the latter for production environments

Setting vm.max_map_count=262144 at node/host level didn't work. Will try out the bootstrap.memory_lock: 'false' next

getkub avatar Dec 20 '21 16:12 getkub

Thanks, the setting to 'false' worked. But now the error is permission of shuffle-database permissions as 1000:1000 How to do permission changes in Kubernetes (i.e. systems without Filesystem access?). Any chance to inject it as env value in yaml or docker-composer file?

getkub avatar Dec 20 '21 17:12 getkub

Thanks, the setting to 'false' worked. But now the error is permission of shuffle-database permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)

That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored?

The base example is indeed to mount in a folder, and giving access that way.

frikky avatar Dec 20 '21 17:12 frikky

Thanks, the setting to 'false' worked. But now the error is permission of shuffle-database permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)

That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored? Yes, it is run on EKS/GKE etc. It is stored as per the volumeMounts specified ${DB_LOCATION}:/usr/share/opensearch/data

Is it better to do as an "initContainers" or chance to update securityContext?

getkub avatar Dec 20 '21 17:12 getkub

Thanks, the setting to 'false' worked. But now the error is permission of shuffle-database permissions as 1000:1000 How to do that in Kubernetes (systems without Filesystem access?)

That's indeed a good question, as there needs to be some kind of filesystem running. Are you deploying on e.g. AWS EKS? Where should the data be stored? Yes, it is run on EKS/GKE etc. It is stored as per the volumeMounts specified ${DB_LOCATION}:/usr/share/opensearch/data

Is it better to do as an "initContainers" or chance to update securityContext?

I think you've reached the limit of my K8s knowledge - especially since this is in a hosted environment. I think this blogpost may be of a lot of help with storage utilities: https://medium.com/google-cloud/a-guide-to-deploy-elasticsearch-cluster-on-google-kubernetes-engine-52f67743ee98.

Keep in mind that Opensearch and Elasticsearch are VERY close to the same software, meaning you can search for elasticsearch to find solutions

frikky avatar Dec 20 '21 17:12 frikky

@dhaval055 - please check if this is a clearer possibility now, and how we may deploy to eks and similar.

frikky avatar Jun 14 '23 10:06 frikky

@dhaval055:

Please check out this for Opensearch as well:

  • Azure Container Instances

frikky avatar Sep 06 '23 09:09 frikky

@dhaval055 input?

0x0elliot avatar Oct 25 '23 12:10 0x0elliot

@dhaval055 input?

Got OpenSearch running on Azure Container Instances. haven't gotten to test on GKE/EKS

dhaval055 avatar Oct 26 '23 04:10 dhaval055