OpenSearch-Dashboards
OpenSearch-Dashboards copied to clipboard
[BUG] opensearch-with-long-numerals blocks and times out from Discover page
Describe the bug
When attempting to view logs on the Discover page with long-numerals the Kibana instance will fail to respond to requests, causing a failure in health checks as well as returning 500 in Kibana logs.
When attempting the same query from DevTools or outside Kibana (curl) no error is returned and results are supplied.
To Reproduce Steps to reproduce the behavior:
- Go to Discover page
- Create search to include logs with long-numerals
- Kibana will stop responding for ~5 min.
Kibana logs after ~5min delay of no new logs...
{"type":"log","@timestamp":"2024-04-08T23:14:18Z","tags":["error","opensearch","data"],"pid":1,"message":"[DeserializationError]: Maximum call stack size exceeded"}
{"type":"response","@timestamp":"2024-04-08T23:09:18Z","tags":[],"pid":1,"method":"post","statusCode":500,"req":{"url":"/internal/search/opensearch-with-long-numerals","method":"post","headers":{"x-forwarded-for":"x.x.x.x","x-forwarded-proto":"https","x-forwarded-port":"443","host":"kibana.url.com","x-amzn-trace-id":"Root=1-6614791e-105ca136348f5fd27136bf8f","content-length":"2220","sec-ch-ua":"\"Google Chrome\";v=\"123\", \"Not:A-Brand\";v=\"8\", \"Chromium\";v=\"123\"","content-type":"application/json","osd-xsrf":"osd-fetch","sec-ch-ua-mobile":"?0","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","osd-version":"2.13.0","sec-ch-ua-platform":"\"Windows\"","accept":"*/*","origin":"https://kibana.url.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","referer":"https://kibana.url.com/app/data-explorer/discover","accept-encoding":"gzip, deflate, br, zstd","accept-language":"en-US,en;q=0.9","securitytenant":"tenant"},"remoteAddress":"x.x.x.x","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36","referer":"https://kibana.url.com/app/data-explorer/discover"},"res":{"statusCode":500,"responseTime":300915,"contentLength":9},"message":"POST /internal/search/opensearch-with-long-numerals 500 300915ms - 9.0B"}
Expected behavior Kibana is expected to continue responding to other requests while processing. Kibana is expected to not throw a 500 error.
OpenSearch Version 2.13.0
Dashboards Version 2.13.0
Host/Environment (please complete the following information):
- OS: CentOS7 from docker image
- Chrome 123.0.6312.106
After more testing I was able to find a work around along with some details that might be helpful.
I was able to duplicate the issue by pushing 200 documents into an index. Each had a single field, initially a string containing JSON which was around 15k characters in length.
When viewing these in Discover I watched the Kibana logs for POST /internal/search/opensearch-with-long-numerals
to get the time taken before Discover rendered or errored.
Initially this took 37541ms
I repeated this with the single large field only containing "A" repeated 15k times. This completed in 96ms.
I then removed characters and captured the time taken in opensearch-with-long-numerals.
Each iteration built on the last. For example in the second pass of removing characters, []'"/\
, {}
were left out. In the last pass all characters listed in the below table were removed from the original JSON string.
chars removed | duration | ms saved |
---|---|---|
37541 | ||
{} | 36983 | 558 |
[]'"/\ | 27350 | 9633 |
| | 27350 | 0 |
: | 25191 | 2159 |
, | 86 | 25105 |
With this in mind I only replaced commas (,) with :: from the original JSON strings. This then only ran for 665ms; from the initial delay of 37541ms this was a marked improvement.
I then replaced commas with pipes on 800 documents and opensearch-with-long-numerals began returning 500 error in 789ms.
opensearch-with-long-numerals is still blocking when in progress which is not ideal. Although with this character replacement it can fail in under 1 second instead of 37 seconds or longer which allows Kibana a larger margin to respond to health checks.
These finding are great. I will use this to figure out the bottleneck.
@lyradc the source of the exception is the opensearch-js
client which uses secure-json-parse
which adds some overhead. However, long-numerals also adds some overhead. In order to speed up my investigation, would you be able to share one of the documents you use for testing?
@lyradc I received the payload; thanks a lot for sending it over. I will dig more and get back to you in a few days.
As I wrote in: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/6134#issuecomment-2049586388 try it with a very long logline which contains this type of message.
Here's an example from our Cassandra: cassandra_paxos_example.json
If you have a message like this 2 or 3 times in your time range, you'll definitely see that Dashboards hangs for a very long time.
Any update here? 2.14.0 was released last week, but it is still broken. I've just tested it with the example from my previous post.
We are facing exactly the same issue on our OpenSearch setup. That would be great to have a solution for this issue.
seems in 2.14 it simply doesnt return a hit, fails silently. query in dashboards/discover for an _id of doc known to have a long int in message field.
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
same query from dev-tools
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "###########-000163",
"_id": "5rY_yo8BUoZvL84giDht",
...
I have made a new package named JSON11 for handling long numerals. https://github.com/opensearch-project/opensearch-js/pull/784 will add that to the JS client and then OSD will adopt it with the appropriate code changes.