pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][broker] Add msgInReplay subscription stat and metric to improve Key_Shared observability

Open lhotari opened this issue 1 year ago • 2 comments

Main Issue: #23205

Motivation

Currently there aren't ways to observe the number of messages that are "in replay" with Key_Shared or Shared subscriptions. Having these metrics would make it easier to understand the behavior of issues such as #23200 . Please check #23205 for more reasons.

Modifications

  • add msgInReplay in the similar way as the existing msgDelayed subscription stat and metric.

Documentation

  • [ ] doc
  • [x] doc-required
  • [ ] doc-not-needed
  • [ ] doc-complete

lhotari avatar Aug 26 '24 10:08 lhotari

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 74.54%. Comparing base (bbc6224) to head (d6d08be). :warning: Report is 1267 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23224      +/-   ##
============================================
+ Coverage     73.57%   74.54%   +0.96%     
- Complexity    32624    34226    +1602     
============================================
  Files          1877     1922      +45     
  Lines        139502   144768    +5266     
  Branches      15299    15832     +533     
============================================
+ Hits         102638   107916    +5278     
+ Misses        28908    28578     -330     
- Partials       7956     8274     +318     
Flag Coverage Δ
inttests 27.66% <28.57%> (+3.08%) :arrow_up:
systests 24.69% <28.57%> (+0.36%) :arrow_up:
unittests 73.90% <100.00%> (+1.05%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ervice/persistent/MessageRedeliveryController.java 95.38% <100.00%> (+0.38%) :arrow_up:
...sistent/PersistentDispatcherMultipleConsumers.java 73.37% <100.00%> (-0.96%) :arrow_down:
...ker/service/persistent/PersistentSubscription.java 77.54% <100.00%> (+0.85%) :arrow_up:
...ker/stats/prometheus/AggregatedNamespaceStats.java 100.00% <100.00%> (ø)
.../stats/prometheus/AggregatedSubscriptionStats.java 100.00% <ø> (ø)
...ker/stats/prometheus/NamespaceStatsAggregator.java 94.24% <100.00%> (+3.83%) :arrow_up:
...che/pulsar/broker/stats/prometheus/TopicStats.java 98.42% <100.00%> (+4.22%) :arrow_up:
...mon/policies/data/stats/SubscriptionStatsImpl.java 94.40% <100.00%> (+0.28%) :arrow_up:

... and 534 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Aug 26 '24 17:08 codecov-commenter

LGTM, but maybe the change needs a PIP?

dao-jun avatar Aug 27 '24 01:08 dao-jun

+1 for having a PIP for this change since it actually changed the Public API output and metrics.

Another point is can we only have new the field in the topic stats? Since we have a lot of metrics on the topic level. Finally, we can't add everything to the prometheus metrics. The stats, the logs are also the part of observability.

@codelipenghui This is not a breaking change in metrics and this change isn't adding significantly more metrics than what currently exists. It's currently a gap in observability of all released versions of Pulsar when msgInReplay is missing. That's why everyone needs this and making it configurable with a feature flag wouldn't be sensible.

lhotari avatar Aug 29 '24 04:08 lhotari