OpenSearch-Dashboards
OpenSearch-Dashboards copied to clipboard
[BUG] opensearch-with-long-numerals runs into timeout
Describe the bug
Don't really know how to describe this. OpenSearch Dashboards 2.12.0 fails to fetch data resulting in a timeout, truncated response and broken JSON where OpenSearch Dashboards 2.11.0 works perfectly fine.
To Reproduce
Don't know. Tried to compare 2.11.0 with 2.12.0. The only difference I found is that 2.12.0 calls POST /internal/search/opensearch-with-long-numerals
whereas 2.11.0 calls POST /internal/search/opensearch
for the exact same query. So there might be a problem with the "long-numerals" part.
The query is a simple 15 second time window on one of our indices. 2.11.0 gives back 397 hits with a response size of 1,02MB within 260ms according to the developer console. 2.12.0 runs into a timeout (120sec) then throws the following error:
JSON.parse: expected ',' or '}' after property value in object at line 1 column 306251 of the JSON data
HttpFetchError@https://host03.server.lan/7326/bundles/core/core.entry.js:15:184257
fetchResponse@https://host03.server.lan/7326/bundles/core/core.entry.js:15:191557
The response size is also 1,02MB (after all it's the same query).
No errors visible in the log (journalctl -u opensearch-dashboards.service
) of OpenSearch Dashboards.
Expected behavior Dashboards 2.12.0 works the same as 2.11.0
OpenSearch Version 2.12.0 (Debian Package installed from artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt)
Dashboards Version 2.12.0 (Debian Package installed from artifacts.opensearch.org/releases/bundle/opensearch-dashboards/2.x/apt)
Plugins
# OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk /usr/share/opensearch/bin/opensearch-plugin list
opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-cross-cluster-replication
opensearch-custom-codecs
opensearch-flow-framework
opensearch-geospatial
opensearch-index-management
opensearch-job-scheduler
opensearch-knn
opensearch-ml
opensearch-neural-search
opensearch-notifications
opensearch-notifications-core
opensearch-observability
opensearch-performance-analyzer
opensearch-reports-scheduler
opensearch-security
opensearch-security-analytics
opensearch-skills
opensearch-sql
prometheus-exporter
(Prometheus Exporter Plugin from: Aiven-Open/prometheus-exporter-plugin-for-opensearch)
Screenshots
Comparing request & response headers with Meld:
Exact same query with the same request and response sizes results in different runtimes and error on 2.12.0 (/internal/search/opensearch-with-long-numerals
) vs. 2.11.0 (/internal/search/opensearch
)
2.11.0 works as expected:
2.12.0 timeouts and throws error:
Host/Environment (please complete the following information):
- Server OS: Debian 12 Bookworm
- Client OS: Linux Mint 21.3 Virginia
- Firefox 123.0
I could narrow it down to one specific log from a Cassandra system.log which apparently causes the timeout/JSON Parse error in 2.12.0. Two examples attached:
cassandra_example1.log cassandra_example2.log
The "message" and "logmessage" fields are quite long, but it's a normal output for Cassandra and causes no issues in Dashboards 2.11.0
Looking at the examples the error apparently happens when Dashboards parses the message field:
cassandra_example1.log:
Completing uncommitted paxos instances for ****** on ranges [(9206423891869844203,-9207833944162114199],
^ this is where the syntax error happens
cassandra_example2.log:
Completed 0 uncommitted paxos instances for ****** on ranges [(9206423891869844203,-9207833944162114199],
^ this is where the syntax error happens
So it looks like that the error is happening within a String (the "message" field). Why does Dashboards try to parse this string as JSON???
Settings for this particular index and fields:
I've created a smaller example which also throws the JSON parse error in Dashboards 2.12.0:
Steps to reproduce:
- Add the following document to an index in your opensearch:
$ curl -v -H "Content-Type: application/json" -X POST "https://myopensearchhost01.server.lan:9200/logstash-2024.03.01/_doc" -d@minimal_example.json -u "user:pass"
- Use "Discover" in OpenSearch Dashboards 2.12.0 and try to query a timerange which contains the document (Feb. 28th, 05:32).
- You'll get the above mentioned exception.
- Same thing with OpenSearch Dashboards 2.11.0 works fine.
We are also facing the same problem after 2.12.0 upgrade any leads or fix would be appriciated. Surprising it's only happening with some specific indexes.
@AMoo-Miki is this fixed? could you double check and resolve this issue?
@ananzh - can you please share details about the fixed release verison for this issue or if it's included in future release.
This is occurring for me as well. Is there any update to this?
Looks like this may have been addressed in the 2.13 release here: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/6134
Can anybody confirm if the bug has been fixed in 2.13.0? I don't have a test cluster unfortunately.
I've tried updating Dashboards only, but it seems that it's not backwards compatible with server version 2.12.0:
{"type":"log","@timestamp":"2024-04-10T06:03:55Z","tags":["error","savedobjects-service"],"pid":330933,"message":"This version of OpenSearch Dashboards (v2.13.0) is incompatible with the following OpenSearch nodes in your cluster: v2.12.0 @ hostname01.lan/10.x.x.x:9200 (10.x.x.x), v2.12.0 @ hostname02.lan/10.x.x.x:9200 (10.x.x.x)"}
We have taken the hotfix and built our own 2.12 snapshot version. Using your minimal data lead to no error.
We run the data on our 2.13 test cluster and no problem either.
Seems it is safe to upgrade to 2.13 regarding the long numerals bug. For safety reasons we will wait out the community experience on 2.13.
Hi @cinhtau ,
please see my comment in #6134 : the minimal example works now, but longer examples still lead to loops: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/6134#issuecomment-2049586388
#6377 seems to be related