performance-analyzer
performance-analyzer copied to clipboard
[BUG]org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus
What is the bug? Since upgrading from 2.12 to 2.13 i see the below WARN messages spamming the logs Json Mapping Error: Cannot invoke “java.lang.Long.longValue()” because “this.cacheMaxSize” is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus[“Cache_MaxSize”])
How can one reproduce the bug? install 2.13
What is the expected behavior? no errors logged
What is your host/environment? opensearch 2.13 container
Having this issue too.
Same here.
Same here
Same here.
Me too
Same here
see this also.
same here(after update from 2.12 to 2.13)
Same here on after update from 2.12.0 to 2.13.0.
I have the same issue on 2.13
Same here on a fresh install on AlmaLinux 9.3.
same here
Same here :-( Version "2.13.0
cluster.name": "opensearch", "node.name": "ubuntu", "message": "Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])", "cluster.uuid": "P6RyCh4KS5SObyb7k05akA", "node.id": "V9D7KQFqRgKoiNsOop8UzQ" }
Same here. Version 2.13.0 on Debian 12. Is there perhaps a setting that needs to be set?
Downgraded to 2.12.0 to bypass the issue for now.
I upgraded back to 2.13.0 and removed the Performance Analyzer plugin, and the errors aren't appearing for me.
Seems to be related to JSON marshaling of a performance metric perhaps?
v1.13.0 https://github.com/opensearch-project/performance-analyzer/blob/42889919319fb0a1f89c6e07b58cd9f7ee2d8718/src/main/java/com/amazon/opendistro/elasticsearch/performanceanalyzer/collectors/CacheConfigMetricsCollector.java#L113-L114
main https://github.com/opensearch-project/performance-analyzer/blob/4928231bed654a6d14c3d27668e1e50e29280a38/src/main/java/org/opensearch/performanceanalyzer/collectors/CacheConfigMetricsCollector.java#L145-L146
I just disable the Performance Analyzer on my cluster Version 2.14 today as described in https://opensearch.org/docs/latest/monitoring-your-cluster/pa/index/#disable-performance-analyzer. Was making the rolling upgrade troublesome.
There are some exceptions raised in the collectMetrics function in CacheConfigMetricsCollector. The current logic returns a CacheMaxSizeStatus with null cacheMaxSize if exception is raised, while we require it to be non-null.
@varunsrivathsav, @atharvasharma61, @psychbot, let's investigate this further to understand:
- What is causing the error to be thrown in 2.13?
- We should fix the above bug in the code to raise/log the exception, rather than returning an Object with null value.
While waiting for the new upgrade, you can perform the following fixes:
In debian or ubuntu, make opensearch.service auto restart on fail, crash or has an unclean exit:
- Edit service file, example
/lib/systemd/system/opensearch.service - In
[Service]before[Install], add 2 lines:
Restart=on-failure
RestartSec=60s
- Run command
systemctl daemon-reloadto reload units - Run
systemctl restart opensearchanh see
Example results:
...
# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target
...
I have the same issue on 2.14 Ubuntu 22.04
{ "name" : "opensearch1", "cluster_name" : "graylog", "cluster_uuid" : "F71gNpV-TUSjVbscIkUSTg", "version" : { "distribution" : "opensearch", "number" : "2.14.0", "build_type" : "deb", "build_hash" : "aaa555453f4713d652b52436874e11ba258d8f03", "build_date" : "2024-05-09T18:50:48.052504416Z", "build_snapshot" : false, "lucene_version" : "9.10.0", "minimum_wire_compatibility_version" : "7.10.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "The OpenSearch Project: https://opensearch.org/" }