OpenSearch-Dashboards icon indicating copy to clipboard operation
OpenSearch-Dashboards copied to clipboard

[BUG] Dashboards fails when one out of three opensearch nodes is down

Open piotrlg opened this issue 5 months ago • 0 comments
trafficstars

Describe the bug

On multinode cluster Dashboards fails when one of the nodes is down.

Steps to reproduce:

Deploy 3 nodes Opensearch cluster with security plugin enabled, tenants used. Dashboards is on node1. Is fine. Turn off node2 or node3. Cluster health is green. Opensearch API works ie using curl: can list indices, add documents etc. Dashboards after a while (not immediately, but after a few minutes) returns: Internal server error, status 500 Turn on node which was down. Soon after it is up Dashboards returns to its normal operations.

In Dashboards logs this pattern repeats when one node is down and I try to use Dashboards:

{"type":"log","@timestamp":"2025-05-22T11:39:22Z","tags":["error","http","server","OpenSearchDashboards"],"pid":1,"message":"Error: Request Timeout after 30000ms\n at SecurityClient.dashboardsinfo (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/backend/opensearch_security_client.ts:130:13)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at BasicAuthentication.resolveTenant (/usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:263:28)\n at /usr/share/opensearch-dashboards/plugins/securityDashboards/server/auth/types/authentication_type.ts:178:24\n at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:116:22)\n at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)\n at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)\n at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)\n at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9)"}
{"type":"error","@timestamp":"2025-05-22T11:38:52Z","tags":[],"pid":1,"level":"error","error":{"message":"Internal Server Error","name":"Error","stack":"Error: Internal Server Error\n at HapiResponseAdapter.toInternalError (/usr/share/opensearch-dashboards/src/core/server/http/router/response_adapter.js:69:19)\n at Object.interceptAuth [as authenticate] (/usr/share/opensearch-dashboards/src/core/server/http/lifecycle/auth.js:148:34)\n at processTicksAndRejections (node:internal/process/task_queues:95:5)\n at exports.Manager.execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/toolkit.js:60:28)\n at module.exports.internals.Auth._authenticate (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/auth.js:273:30)\n at Request._lifecycle (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:371:32)\n at Request._execute (/usr/share/opensearch-dashboards/node_modules/@hapi/hapi/lib/request.js:281:9)"},"url":"https://localhost:5601/","message":"Internal Server Error"}
{"type":"response","@timestamp":"2025-05-22T11:38:52Z","tags":[],"pid":1,"method":"head","statusCode":500,"req":{"url":"/","method":"head","headers":{"host":"localhost:5601","user-agent":"curl/7.76.1","accept":"*/*"},"remoteAddress":"10.89.0.4","userAgent":"curl/7.76.1"},"res":{"statusCode":500,"responseTime":30154,"contentLength":9},"message":"HEAD / 500 30154ms - 9.0B"}

Expected behavior When one out of three nodes is down Dashboards should still work.

OpenSearch Version 2.19.1, in containers.

Dashboards Version 2.19.1, in containers.

Plugins All which come as default, especially security plugin is configured, and enabled.

Host/Environment (please complete the following information):

  • OS: RHEL 9 as host, RHEL 8 as base image for container
  • Latest Edge

Additional context

Worth to mention. When I turned off tenants Dashboards start to work better (but not stable still). There is no more Internal server error with status 500. I can use Dashboards but it is unstable, same features (like listing all indices in Index management) sometimes works, sometimes not). The Dev tools for example looks fine, but the Index management part not. During this scenario with tenants off I can see in the Dashboards logs (of course with one node down):

{"type":"response","@timestamp":"2025-05-27T14:37:38Z","tags":[],"pid":1,"method":"get","statusCode":500,"req":{"url":"/internal/index-pattern-management/resolve_index/*","method":"get","headers":{"host":"10.17.229.167:5601","connection":"keep-alive","sec-ch-ua-platform":"\"Windows\"","sec-ch-ua":"\"Chromium\";v=\"136\", \"Microsoft Edge\";v=\"136\", \"Not.A/Brand\";v=\"99\"","sec-ch-ua-mobile":"?0","osd-version":"2.19.1","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0","dnt":"1","content-type":"application/json","osd-xsrf":"osd-fetch","accept":"*/*","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://10.17.229.167:5601/app/management/opensearch-dashboards/indexPatterns","accept-encoding":"gzip, deflate, br, zstd","accept-language":"en-US,en;q=0.9,de-AT;q=0.8,de;q=0.7","securitytenant":""},"remoteAddress":"10.89.0.27","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0","referer":"https://10.17.229.167:5601/app/management/opensearch-dashboards/indexPatterns"},"res":{"statusCode":500,"responseTime":30022,"contentLength":9},"message":"GET /internal/index-pattern-management/resolve_index/* 500 30022ms - 9.0B"}
Index Management - IndexService - _getManagedStatus: StatusCodeError: Request Timeout after 30000ms
    at /usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:397:9
    at Timeout.<anonymous> (/usr/share/opensearch-dashboards/node_modules/elasticsearch/src/lib/transport.js:429:7)
    at listOnTimeout (node:internal/timers:569:17)
    at processTimers (node:internal/timers:512:7) {
  status: undefined,
  displayName: 'RequestTimeout',
  body: undefined
}
{"type":"response","@timestamp":"2025-05-27T14:37:48Z","tags":[],"pid":1,"method":"get","statusCode":200,"req":{"url":"/api/ism/_indices?from=0&size=20&search=&sortField=index&sortDirection=desc&showDataStreams=false","method":"get","headers":{"host":"10.17.229.167:5601","connection":"keep-alive","sec-ch-ua-platform":"\"Windows\"","sec-ch-ua":"\"Chromium\";v=\"136\", \"Microsoft Edge\";v=\"136\", \"Not.A/Brand\";v=\"99\"","sec-ch-ua-mobile":"?0","osd-version":"2.19.1","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0","dnt":"1","content-type":"application/json","osd-xsrf":"osd-fetch","accept":"*/*","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://10.17.229.167:5601/app/opensearch_index_management_dashboards","accept-encoding":"gzip, deflate, br, zstd","accept-language":"en-US,en;q=0.9,de-AT;q=0.8,de;q=0.7","securitytenant":""},"remoteAddress":"10.89.0.27","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0","referer":"https://10.17.229.167:5601/app/opensearch_index_management_dashboards"},"res":{"statusCode":200,"responseTime":60021,"contentLength":9},"message":"GET /api/ism/_indices?from=0&size=20&search=&sortField=index&sortDirection=desc&showDataStreams=false 200 60021ms - 9.0B"}


piotrlg avatar May 28 '25 07:05 piotrlg