pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[feat][misc] PIP-264: Implement topic lookup metrics using OpenTelemetry

Open dragosvictor opened this issue 1 year ago • 2 comments

PIP-264

Motivation

Part of PIP-264 related work to convert metrics to OpenTelemetry format. This PR handles topic lookup metrics, as currently defined in the documentation.

Note that metric pulsar_broker_load_manager_bundle_assignment fits better into the Load Balancing metrics section, so is not included in this work.

Modifications

Added metrics as described below:

Old metric name New metric details
pulsar_broker_lookup Name pulsar.broker.lookup.latency
Type Histogram
Description Lookup request latency
Source Internal OpenTelemetry implementation
Attributes (none)
Cardinality 1 per broker
pulsar_broker_lookup_redirects Name pulsar.broker.lookup.redirect
Type LongCounter
Description The number of lookup redirected requests
Source Internal OpenTelemetry implementation
Attributes (none)
Cardinality 1 per broker
pulsar_broker_lookup_answers Name pulsar.broker.lookup.answer
Type LongCounter
Description The number of lookup responses (i.e. not redirected requests)
Source Internal OpenTelemetry implementation
Attributes (none)
Cardinality 1 per broker
pulsar_broker_lookup_failures Name pulsar.broker.lookup.failure
Type LongCounter
Description The number of lookup failures
Source Internal OpenTelemetry implementation
Attributes (none)
Cardinality 1 per broker
pulsar_broker_lookup_pending_requests Name pulsar.broker.lookup.pending.request.usage
Type ObservableLongGauge
Description The number of pending lookup requests in the broker. When it reaches threshold "maxConcurrentLookupRequest" defined in broker.conf, new requests are rejected.
Source Call to BrokerService#getPendingLookupRequest
Attributes (none)
Cardinality 1 per broker
pulsar_broker_topic_load_pending_requests Name pulsar.broker.topic.load.pending.request.usage
Type ObservableLongGauge
Description The number of pending topic load operations in the broker. When it reaches threshold "maxConcurrentTopicLoadRequest" defined in broker.conf, new requests are rejected.
Source Call to BrokerService#getPendingTopicLoadRequests
Attributes (none)
Cardinality 1 per broker

Moved instantiation of PulsarBrokerOpenTelemetry object to PulsarService constructor, as it is needed by the NamespaceService and BrokerService. The OpenTelemetry standard recommends setting up the SDK as early as possible, so this suits us well.

Changed PulsarTestContext to allow OpenTelemetry metrics validation as part of the tests.

Verifying this change

  • [x] Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

(example:)

  • Extended existing lookup tests BrokerServiceLookupTest#testMultipleBrokerLookup and BrokerServiceLookupTest#testLookupConnectionNotCloseIfGetUnloadingExOrMetadataEx to cover the metric validation

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [x] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [x] doc-required
  • [ ] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/dragosvictor/pulsar/pull/11

dragosvictor avatar Feb 15 '24 21:02 dragosvictor

@dragosvictor Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

github-actions[bot] avatar Feb 15 '24 21:02 github-actions[bot]

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (beed0cf) 73.71% compared to head (1e9bc12) 73.59%. Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22058      +/-   ##
============================================
- Coverage     73.71%   73.59%   -0.13%     
- Complexity    32116    32571     +455     
============================================
  Files          1874     1875       +1     
  Lines        139220   139276      +56     
  Branches      15260    15260              
============================================
- Hits         102628   102500     -128     
- Misses        28695    28858     +163     
- Partials       7897     7918      +21     
Flag Coverage Δ
inttests 24.68% <89.55%> (-0.16%) :arrow_down:
systests 24.29% <80.59%> (-0.22%) :arrow_down:
unittests 72.86% <92.53%> (-0.12%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...n/java/org/apache/pulsar/broker/PulsarService.java 83.13% <100.00%> (ø)
...ache/pulsar/broker/namespace/NamespaceService.java 73.03% <100.00%> (+0.79%) :arrow_up:
...pulsar/broker/stats/PulsarBrokerOpenTelemetry.java 100.00% <100.00%> (+9.09%) :arrow_up:
...che/pulsar/opentelemetry/OpenTelemetryService.java 92.00% <ø> (ø)
...va/org/apache/pulsar/common/stats/MetricsUtil.java 50.00% <50.00%> (ø)
...rg/apache/pulsar/broker/service/BrokerService.java 81.38% <88.57%> (+0.26%) :arrow_up:

... and 75 files with indirect coverage changes

codecov-commenter avatar Feb 16 '24 18:02 codecov-commenter

@dragosvictor Thanks. I replied to all comments.

asafm avatar Feb 21 '24 16:02 asafm

@lhotari You're up next on the review wrap up :)

asafm avatar Mar 05 '24 16:03 asafm