helm-charts
helm-charts copied to clipboard
[BUG] Container doesn't start up after ungraceful termination
Describe the bug
OpenSearch 2.11 is running in Kubernetes with 3 pods, pretty vanilla installation with the latest Helm Chart 2.17.0. The pods terminate ungracefully (e.g., through a blackout of the cluster or a bug in the node eviction not respecting the graceful termination period).
After the pods come back up, one of the pods starts outputting errors:
{"type": "server", "timestamp": "2023-11-24T14:52:02,526Z", "level": "ERROR", "component": "o.o.b.OpenSearchUncaughtExceptionHandler", "cluster.name": "os", "node.name": "os-mngr-1", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/opensearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
...
It seems there are lock files on the PVC that prevents the container from starting:
/usr/share/opensearch/data/nodes/0/node.lock
/usr/share/opensearch/data/nodes/0/_state/write.lock
What is the recommended way dealing with this situation? Deleting the lock files lets the container start, but unassigned charts remain leaving the cluster in a yellow state. The only solution that worked so far: deleting the PVC and the pod and have the stateful set recreate the PVC and pod, and have the missing replicas of indices get recreated.
Shouldn't be there a way or a documentation how to proceed in such a case? Ideally with a less aggressive strategy than deleting the disk?
I also tried asking the community for help, but no luck so far: https://forum.opensearch.org/t/unassigned-shards-after-killed-containers-blackout/16812
[Untriage] Hey @cpockrandt thanks for reporting this bug, may I know if you are using a NFS for the PVC ?
[Untriage] Hey @cpockrandt thanks for reporting this bug, may I know if you are using a NFS for the PVC ?
Hey @prudhvigodithi, I used PVC.