pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[feat][broker] PIP-264: Add Java runtime metrics

Open dragosvictor opened this issue 1 year ago • 1 comments

PIP-264

Motivation

Adds support for exporting Java runtime metrics via the OpenTelemetry pipeline. For ease of implementation, relies on the built-in OTel library providing this exact functionality. We can add any extra metrics we see fit separately, as deemed necessary.

Modifications

Add and initialize field type io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics in OpenTelemetryService. This object exposes the desired runtime metrics. Placing it inside OpenTelemetryService automatically makes it available for all users of the class (broker, proxy and function workers) without any further work needed. Enable metric collection for all Java features.

Some of the metrics are presently marked as experimental (see doc): jvm.memory.init, jvm.buffer.memory.usage, jvm.buffer.memory.limit, jvm.buffer.count. Others are exposed only if flag OTEL_SEMCONV_STABILITY_IN includes value jvm. Since they are still useful, enable their collection too.

The original (Prometheus) metric list can be consulted here.

For a full description of the semantics of the OpenTelemetry metrics, see this.

OpenTelemetry (new) metric name Prometheus (old) metric name
jvm.buffer.count jvm_buffer_pool_used_buffers
jvm.buffer.memory.limit jvm_buffer_pool_capacity_bytes
jvm.buffer.memory.usage jvm_buffer_pool_used_bytes
jvm.class.count jvm_classes_currently_loaded
jvm.class.loaded jvm_classes_loaded_total
jvm.class.unloaded jvm_classes_unloaded_total
jvm.cpu.time process_cpu_seconds_total
jvm.gc.duration jvm_gc_collection_seconds
jvm.memory.committed jvm_memory_bytes_committed
jvm.memory.init jvm_memory_bytes_init
jvm.memory.limit jvm_memory_bytes_max
jvm.memory.used jvm_memory_bytes_used
jvm.memory.used_after_last_gc jvm_memory_pool_allocated_bytes_total
jvm.thread.count jvm_threads_state, jvm_threads_current and jvm_threads_daemon

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added unit test org.apache.pulsar.opentelemetry.OpenTelemetryServiceTest#testJvmRuntimeMetrics verifying the respective metrics are present at runtime.

Does this pull request potentially affect one of the following parts:

  • [x] Dependencies (add or upgrade a dependency) Added OTEL runtime metrics library
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [x] The metrics: Added runtime metrics as described above
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [x] doc-required
  • [ ] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/dragosvictor/pulsar/pull/18

dragosvictor avatar Apr 29 '24 20:04 dragosvictor

There were quite a few OOMs in that last runs. I'll close and reopen to see if they were originating from the changes made in this PR.

lhotari avatar May 03 '24 20:05 lhotari