[feat][broker] PIP-264: Add Java runtime metrics
Motivation
Adds support for exporting Java runtime metrics via the OpenTelemetry pipeline. For ease of implementation, relies on the built-in OTel library providing this exact functionality. We can add any extra metrics we see fit separately, as deemed necessary.
Modifications
Add and initialize field type io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics in OpenTelemetryService. This object exposes the desired runtime metrics. Placing it inside OpenTelemetryService automatically makes it available for all users of the class (broker, proxy and function workers) without any further work needed. Enable metric collection for all Java features.
Some of the metrics are presently marked as experimental (see doc): jvm.memory.init, jvm.buffer.memory.usage, jvm.buffer.memory.limit, jvm.buffer.count. Others are exposed only if flag OTEL_SEMCONV_STABILITY_IN includes value jvm. Since they are still useful, enable their collection too.
The original (Prometheus) metric list can be consulted here.
For a full description of the semantics of the OpenTelemetry metrics, see this.
| OpenTelemetry (new) metric name | Prometheus (old) metric name |
|---|---|
jvm.buffer.count |
jvm_buffer_pool_used_buffers |
jvm.buffer.memory.limit |
jvm_buffer_pool_capacity_bytes |
jvm.buffer.memory.usage |
jvm_buffer_pool_used_bytes |
jvm.class.count |
jvm_classes_currently_loaded |
jvm.class.loaded |
jvm_classes_loaded_total |
jvm.class.unloaded |
jvm_classes_unloaded_total |
jvm.cpu.time |
process_cpu_seconds_total |
jvm.gc.duration |
jvm_gc_collection_seconds |
jvm.memory.committed |
jvm_memory_bytes_committed |
jvm.memory.init |
jvm_memory_bytes_init |
jvm.memory.limit |
jvm_memory_bytes_max |
jvm.memory.used |
jvm_memory_bytes_used |
jvm.memory.used_after_last_gc |
jvm_memory_pool_allocated_bytes_total |
jvm.thread.count |
jvm_threads_state, jvm_threads_current and jvm_threads_daemon |
Verifying this change
- [ ] Make sure that the change passes the CI checks.
This change added tests and can be verified as follows:
- Added unit test
org.apache.pulsar.opentelemetry.OpenTelemetryServiceTest#testJvmRuntimeMetricsverifying the respective metrics are present at runtime.
Does this pull request potentially affect one of the following parts:
- [x] Dependencies (add or upgrade a dependency) Added OTEL runtime metrics library
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [x] The metrics: Added runtime metrics as described above
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [x]
doc-required - [ ]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository: https://github.com/dragosvictor/pulsar/pull/18
There were quite a few OOMs in that last runs. I'll close and reopen to see if they were originating from the changes made in this PR.