pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel)

Open eolivelli opened this issue 3 years ago • 1 comments

Motivation

Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache.

When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them

Modifications

  • Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads.
  • Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB
  • The feature is disabled by default
  • Add new metrics to track the status of the broker

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/eolivelli/pulsar/pull/19

eolivelli avatar Oct 28 '22 10:10 eolivelli

Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread

@eolivelli Does it only happen when the broker has many subscriptions? For one subscription, we always trigger the new read entries operation after sending messages to consumers. We only have a single active read entries operation at a time. Is it able to reproduce?

codelipenghui avatar Nov 01 '22 10:11 codelipenghui

@codelipenghui you are correct, the problems arise when you have a broker with many subscriptions on the same topic (and many topics). There is no broker level guardrail at the moment. With this patch the memory used to handle outbound traffic is capped and the limit is independent from the number of active subscriptions on the broker

eolivelli avatar Nov 10 '22 07:11 eolivelli

Codecov Report

Merging #18245 (9cc5757) into master (b31c5a6) will increase coverage by 0.98%. The diff coverage is 45.65%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18245      +/-   ##
============================================
+ Coverage     46.98%   47.97%   +0.98%     
+ Complexity    10343     9370     -973     
============================================
  Files           692      613      -79     
  Lines         67766    58415    -9351     
  Branches       7259     6087    -1172     
============================================
- Hits          31842    28026    -3816     
+ Misses        32344    27377    -4967     
+ Partials       3580     3012     -568     
Flag Coverage Δ
unittests 47.97% <45.65%> (+0.98%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...lsar/broker/service/RedeliveryTrackerDisabled.java 50.00% <ø> (ø)
...va/org/apache/pulsar/broker/service/ServerCnx.java 49.20% <ø> (+0.51%) :arrow_up:
...ersistentStreamingDispatcherMultipleConsumers.java 0.00% <0.00%> (ø)
.../java/org/apache/pulsar/client/impl/ClientCnx.java 30.16% <ø> (ø)
...a/org/apache/pulsar/client/impl/TableViewImpl.java 0.00% <0.00%> (ø)
...ar/client/impl/conf/ProducerConfigurationData.java 84.70% <ø> (-0.18%) :arrow_down:
...va/org/apache/pulsar/client/impl/ConsumerImpl.java 15.09% <12.50%> (+0.05%) :arrow_up:
...sistent/PersistentDispatcherMultipleConsumers.java 58.49% <66.66%> (+1.67%) :arrow_up:
...ache/pulsar/broker/ManagedLedgerClientFactory.java 62.16% <100.00%> (+1.05%) :arrow_up:
.../pulsar/broker/service/AbstractBaseDispatcher.java 60.81% <100.00%> (+2.89%) :arrow_up:
... and 134 more

codecov-commenter avatar Nov 10 '22 07:11 codecov-commenter

/pulsarbot run-failure-checks

codelipenghui avatar Nov 11 '22 00:11 codelipenghui