performance-analyzer icon indicating copy to clipboard operation
performance-analyzer copied to clipboard

[BUG]org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus

Open coredump17 opened this issue 1 year ago • 15 comments
trafficstars

What is the bug? Since upgrading from 2.12 to 2.13 i see the below WARN messages spamming the logs Json Mapping Error: Cannot invoke “java.lang.Long.longValue()” because “this.cacheMaxSize” is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus[“Cache_MaxSize”])

How can one reproduce the bug? install 2.13

What is the expected behavior? no errors logged

What is your host/environment? opensearch 2.13 container

coredump17 avatar Apr 06 '24 09:04 coredump17

Having this issue too.

timolow avatar Apr 09 '24 19:04 timolow

Same here.

guldil avatar Apr 10 '24 04:04 guldil

Same here

sarankup avatar Apr 10 '24 05:04 sarankup

Same here.

Gradlon avatar Apr 11 '24 10:04 Gradlon

Me too

ComBin avatar Apr 11 '24 11:04 ComBin

Same here

geckiss avatar Apr 12 '24 13:04 geckiss

see this also.

dxturner avatar Apr 15 '24 14:04 dxturner

same here(after update from 2.12 to 2.13)

slayerjk avatar Apr 17 '24 10:04 slayerjk

Same here on after update from 2.12.0 to 2.13.0.

rtista avatar Apr 18 '24 17:04 rtista

I have the same issue on 2.13

22charud avatar Apr 19 '24 00:04 22charud

Same here on a fresh install on AlmaLinux 9.3.

TheHansam avatar Apr 22 '24 06:04 TheHansam

same here

pmarjou22 avatar Apr 22 '24 12:04 pmarjou22

Same here :-( Version "2.13.0

cluster.name": "opensearch", "node.name": "ubuntu", "message": "Json Mapping Error: Cannot invoke "java.lang.Long.longValue()" because "this.cacheMaxSize" is null (through reference chain: org.opensearch.performanceanalyzer.collectors.CacheConfigMetricsCollector$CacheMaxSizeStatus["Cache_MaxSize"])", "cluster.uuid": "P6RyCh4KS5SObyb7k05akA", "node.id": "V9D7KQFqRgKoiNsOop8UzQ" }

tdankers avatar Apr 23 '24 20:04 tdankers

Same here. Version 2.13.0 on Debian 12. Is there perhaps a setting that needs to be set?

Downgraded to 2.12.0 to bypass the issue for now.

merlinz01 avatar Apr 26 '24 15:04 merlinz01

I upgraded back to 2.13.0 and removed the Performance Analyzer plugin, and the errors aren't appearing for me.

Seems to be related to JSON marshaling of a performance metric perhaps?

v1.13.0 https://github.com/opensearch-project/performance-analyzer/blob/42889919319fb0a1f89c6e07b58cd9f7ee2d8718/src/main/java/com/amazon/opendistro/elasticsearch/performanceanalyzer/collectors/CacheConfigMetricsCollector.java#L113-L114

main https://github.com/opensearch-project/performance-analyzer/blob/4928231bed654a6d14c3d27668e1e50e29280a38/src/main/java/org/opensearch/performanceanalyzer/collectors/CacheConfigMetricsCollector.java#L145-L146

merlinz01 avatar May 04 '24 00:05 merlinz01

I just disable the Performance Analyzer on my cluster Version 2.14 today as described in https://opensearch.org/docs/latest/monitoring-your-cluster/pa/index/#disable-performance-analyzer. Was making the rolling upgrade troublesome.

cinhtau avatar May 29 '24 08:05 cinhtau

There are some exceptions raised in the collectMetrics function in CacheConfigMetricsCollector. The current logic returns a CacheMaxSizeStatus with null cacheMaxSize if exception is raised, while we require it to be non-null.

@varunsrivathsav, @atharvasharma61, @psychbot, let's investigate this further to understand:

  • What is causing the error to be thrown in 2.13?
  • We should fix the above bug in the code to raise/log the exception, rather than returning an Object with null value.

ansjcy avatar May 29 '24 20:05 ansjcy

While waiting for the new upgrade, you can perform the following fixes: In debian or ubuntu, make opensearch.service auto restart on fail, crash or has an unclean exit:

  • Edit service file, example /lib/systemd/system/opensearch.service
  • In [Service] before [Install], add 2 lines:
Restart=on-failure
RestartSec=60s
  • Run command systemctl daemon-reload to reload units
  • Run systemctl restart opensearch anh see

Example results:

...
# Allow a slow startup before the systemd notifier module kicks in to extend the timeout
TimeoutStartSec=75

Restart=on-failure
RestartSec=60s

[Install]
WantedBy=multi-user.target
...

caothu159 avatar May 31 '24 08:05 caothu159

I have the same issue on 2.14 Ubuntu 22.04

{ "name" : "opensearch1", "cluster_name" : "graylog", "cluster_uuid" : "F71gNpV-TUSjVbscIkUSTg", "version" : { "distribution" : "opensearch", "number" : "2.14.0", "build_type" : "deb", "build_hash" : "aaa555453f4713d652b52436874e11ba258d8f03", "build_date" : "2024-05-09T18:50:48.052504416Z", "build_snapshot" : false, "lucene_version" : "9.10.0", "minimum_wire_compatibility_version" : "7.10.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "The OpenSearch Project: https://opensearch.org/" }

borisdenis avatar Jun 19 '24 10:06 borisdenis