OpenSearch-Dashboards icon indicating copy to clipboard operation
OpenSearch-Dashboards copied to clipboard

[BUG] opensearch-with-long-numerals blocks and times out from Discover page

Open lyradc opened this issue 10 months ago • 5 comments

Describe the bug

When attempting to view logs on the Discover page with long-numerals the Kibana instance will fail to respond to requests, causing a failure in health checks as well as returning 500 in Kibana logs.

When attempting the same query from DevTools or outside Kibana (curl) no error is returned and results are supplied.

To Reproduce Steps to reproduce the behavior:

  1. Go to Discover page
  2. Create search to include logs with long-numerals
  3. Kibana will stop responding for ~5 min.

Kibana logs after ~5min delay of no new logs...

{"type":"log","@timestamp":"2024-04-08T23:14:18Z","tags":["error","opensearch","data"],"pid":1,"message":"[DeserializationError]: Maximum call stack size exceeded"}
{"type":"response","@timestamp":"2024-04-08T23:09:18Z","tags":[],"pid":1,"method":"post","statusCode":500,"req":{"url":"/internal/search/opensearch-with-long-numerals","method":"post","headers":{"x-forwarded-for":"x.x.x.x","x-forwarded-proto":"https","x-forwarded-port":"443","host":"kibana.url.com","x-amzn-trace-id":"Root=1-6614791e-105ca136348f5fd27136bf8f","content-length":"2220","sec-ch-ua":"\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"","content-type":"application/json","osd-xsrf":"osd-fetch","sec-ch-ua-mobile":"?0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","osd-version":"2.13.0","sec-ch-ua-platform":"\"Windows\"","accept":"*/*","origin":"https://kibana.url.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://kibana.url.com/app/data-explorer/discover","accept-encoding":"gzip, deflate, br, zstd","accept-language":"en-US,en;q=0.9","securitytenant":"tenant"},"remoteAddress":"x.x.x.x","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","referer":"https://kibana.url.com/app/data-explorer/discover"},"res":{"statusCode":500,"responseTime":300915,"contentLength":9},"message":"POST /internal/search/opensearch-with-long-numerals 500 300915ms - 9.0B"}

Expected behavior Kibana is expected to continue responding to other requests while processing. Kibana is expected to not throw a 500 error.

OpenSearch Version 2.13.0

Dashboards Version 2.13.0

Host/Environment (please complete the following information):

  • OS: CentOS7 from docker image
  • Chrome 123.0.6312.106

lyradc avatar Apr 09 '24 00:04 lyradc

After more testing I was able to find a work around along with some details that might be helpful.

I was able to duplicate the issue by pushing 200 documents into an index. Each had a single field, initially a string containing JSON which was around 15k characters in length. When viewing these in Discover I watched the Kibana logs for POST /internal/search/opensearch-with-long-numerals to get the time taken before Discover rendered or errored. Initially this took 37541ms

I repeated this with the single large field only containing "A" repeated 15k times. This completed in 96ms.

I then removed characters and captured the time taken in opensearch-with-long-numerals.

Each iteration built on the last. For example in the second pass of removing characters, []'"/\, {} were left out. In the last pass all characters listed in the below table were removed from the original JSON string.

chars removed duration ms saved
37541
{} 36983 558
[]'"/\ 27350 9633
| 27350 0
: 25191 2159
, 86 25105

With this in mind I only replaced commas (,) with :: from the original JSON strings. This then only ran for 665ms; from the initial delay of 37541ms this was a marked improvement.

I then replaced commas with pipes on 800 documents and opensearch-with-long-numerals began returning 500 error in 789ms.

opensearch-with-long-numerals is still blocking when in progress which is not ideal. Although with this character replacement it can fail in under 1 second instead of 37 seconds or longer which allows Kibana a larger margin to respond to health checks.

lyradc avatar Apr 10 '24 22:04 lyradc

These finding are great. I will use this to figure out the bottleneck.

AMoo-Miki avatar Apr 11 '24 04:04 AMoo-Miki

@lyradc the source of the exception is the opensearch-js client which uses secure-json-parse which adds some overhead. However, long-numerals also adds some overhead. In order to speed up my investigation, would you be able to share one of the documents you use for testing?

AMoo-Miki avatar Apr 11 '24 05:04 AMoo-Miki

@lyradc I received the payload; thanks a lot for sending it over. I will dig more and get back to you in a few days.

AMoo-Miki avatar Apr 11 '24 17:04 AMoo-Miki

As I wrote in: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/6134#issuecomment-2049586388 try it with a very long logline which contains this type of message.

Here's an example from our Cassandra: cassandra_paxos_example.json

If you have a message like this 2 or 3 times in your time range, you'll definitely see that Dashboards hangs for a very long time.

rlueckl avatar Apr 15 '24 09:04 rlueckl

Any update here? 2.14.0 was released last week, but it is still broken. I've just tested it with the example from my previous post.

rlueckl avatar May 21 '24 08:05 rlueckl

We are facing exactly the same issue on our OpenSearch setup. That would be great to have a solution for this issue.

lsoumille avatar May 23 '24 07:05 lsoumille

seems in 2.14 it simply doesnt return a hit, fails silently. query in dashboards/discover for an _id of doc known to have a long int in message field.

       "hits": {
            "total": 0,
            "max_score": null,
            "hits": []
        },

same query from dev-tools

  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "###########-000163",
        "_id": "5rY_yo8BUoZvL84giDht",
        ...

bbfoto avatar Jun 02 '24 14:06 bbfoto

I have made a new package named JSON11 for handling long numerals. https://github.com/opensearch-project/opensearch-js/pull/784 will add that to the JS client and then OSD will adopt it with the appropriate code changes.

AMoo-Miki avatar Jun 03 '24 18:06 AMoo-Miki