OpenSearch-Dashboards icon indicating copy to clipboard operation
OpenSearch-Dashboards copied to clipboard

OpenSearch Dashboards failures after upgrade 2.9 to 2.12

Open rlevytskyi opened this issue 1 year ago • 5 comments

Dashboards Suddenly Dies Hello OpenSearch Team, We’ve just updated our OpenSearch cluster from version 2.9.0 to 2.12.0. Among other issues, we’ve noticed that Opensearch Dashboards container sometimes get unexpectedly stopped. There is no error message at it’s log but several entries at system log like these (I’ve reduced them slightly):

vm85 dockerd[1011]: msg="ignoring event" container=e490 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
vm85 containerd[902]: msg="shim disconnected" id=e490
vm85 containerd[902]: msg="cleaning up after shim disconnecte d" id=e490 namespace=moby
vm85 containerd[902]: msg="cleaning up dead shim"
vm85 containerd[902]: msg="cleanup warnings time=\"2024-02-23 T14:27:19Z\" level=info msg=\"starting signal loop\" namespace=moby pid=11722 runtime=io.containerd.runc.v2\n" 
vm85 dockerd[1011]: msg="ShouldRestart failed, container will not be restarted" container=e490 daemonShuttingDown=false error="restart c anceled" execDuration=10m7.524639324s exitStatus="{0 2024-02-23 14:27:18.998984252 +0000 UTC}" hasBeenManuallyStopped=true restartCount =4
vm85 containerd[902]: msg="loading plugin \"io.containerd.event.  v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
vm85 containerd[902]: msg="loading plugin \"io.containerd.intern al.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
vm85 containerd[902]: msg="loading plugin \"io.containerd.ttrpc.  v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
vm85 containerd[902]: msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/e490 pid=11753 runt ime=io.containerd.runc.v2

I managed to fix this by uncommenting and changing the string at the node.options configuration file: --max-old-space-size=6100

My questions are:

  • What is a default value for memory limit?
  • Are there any recommended values?

To Reproduce Steps to reproduce the behavior:

  1. Open any complex dashboard consisting of multiple items.

Expected behavior In 2.9, our dashboards were rendering properly.

OpenSearch Version 2.12 using Docker image opensearchproject/opensearch:2.12.0

Dashboards Version 2.12 using Docker image opensearchproject/opensearch-dashboards:2.12.0

Plugins Default list that came with distribution.

Screenshots Not applicable.

Host/Environment (please complete the following information):

  • Oracle Linux Server release 8.9
  • Ubuntu 23.10
  • Google Chrome Version 121.0.6167.160 (Official Build) (64-bit)

Additional context No additional context yet.

rlevytskyi avatar Feb 23 '24 19:02 rlevytskyi

Spike task: look into the performance issue from 2.9 to 2.11. @kavilla @manasvinibs

abbyhu2000 avatar Feb 27 '24 18:02 abbyhu2000

@rlevytskyi would you be willing to share any more details about your settings/plugins/indexes to help us reproduce and diagnose?

wbeckler avatar Feb 27 '24 18:02 wbeckler

Thank you @wbeckler for your reply! We are running non-dedicated manager cluster, we have four nodes running both data and master-eligible nodes and two coordinating nodes.

  • 4 data nodes with 112GB of Xmx RAM and 13.6 TB of storage
  • 5500 indices (mostly small of 1 shards, but several big of 4 shards) up to 75% of capacity
  • 26600 shards
  • upgraded 2.9 to 2.12 and add Xmx RAM to make it 128GB
  • had to close 2000 indices to make cluster operable again

Some details at the neighbor topic https://github.com/opensearch-project/OpenSearch/issues/12454

rlevytskyi avatar Feb 27 '24 19:02 rlevytskyi

Is it possible that the memory issue for your data nodes is starving resources from your dashboard containers?

wbeckler avatar Mar 01 '24 23:03 wbeckler

@wbeckler absolutely no, we have 32GB of RAM for VM running this Dashboards and Coordinating node with 12GB heap. No OOM or something at system logs. Adding some memory to Kibana helped.

rlevytskyi avatar Mar 07 '24 13:03 rlevytskyi