pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][broker] Reschedule reads with increasing backoff when no messages are dispatched

Open lhotari opened this issue 1 year ago • 0 comments

Main Issue: #23200

Motivation

There's currently a clear problem with Key_Shared that in normal operations, it causes a lot of "ack holes" which result in several problems. One of the problems is the latency issues that are explained in #23200. Another problem is that the large number of "ack holes" exceed managedLedgerMaxUnackedRangesToPersist (10000) in usual cases such as in the demonstration in #23200.

There are multiple other issues where there has been a large number of "ack holes" when Pulsar users have experienced problems. One of the previous mitigations is PIP-299: Stop dispatch messages if the individual acks will be lost in the persistent storage.. The need for PIP-299 proves that the large number of "ack holes" is a fairly common problem.

Modifications

While experimenting on #23200, it was determined that #7105 changes were related to the cause of the issue. I also noticed that #18315 contained some impactful changes (https://github.com/apache/pulsar/pull/18315/files#diff-c48d5c94842ac8c9a0c9314b207298069f38c8dcfeda4a9886fb3bb1f77843f2). Based on this information,

I decided to implement a solution where there would be a backoff when no messages are dispatched. This PR contains a change that reschedules a call to readMoreEntries where the delay is exponentially increasing as long as no entries are dispatched. The backoff delay starts at 100ms and is limited to 5000ms. These values are currently static but they could be made configurable.

Additional context

While testing this change, I happened to notice that this change mitigates the problem in the reproducer of of #23200.

With the changes of this PR, these are the results:

2024-08-26T16:09:42,328+0300 [main] INFO  playground.TestScenarioIssueKeyShared - Done receiving. Remaining: 0 duplicates: 0 unique: 1000000
max latency difference of subsequent messages: 974 ms
max ack holes: 668
2024-08-26T16:09:42,329+0300 [main] INFO  playground.TestScenarioIssueKeyShared - Consumer consumer1 received 259642 unique messages 0 duplicates in 456 s, max latency difference of subsequent messages 763 ms
2024-08-26T16:09:42,329+0300 [main] INFO  playground.TestScenarioIssueKeyShared - Consumer consumer2 received 233963 unique messages 0 duplicates in 456 s, max latency difference of subsequent messages 974 ms
2024-08-26T16:09:42,329+0300 [main] INFO  playground.TestScenarioIssueKeyShared - Consumer consumer3 received 244279 unique messages 0 duplicates in 457 s, max latency difference of subsequent messages 898 ms
2024-08-26T16:09:42,329+0300 [main] INFO  playground.TestScenarioIssueKeyShared - Consumer consumer4 received 262116 unique messages 0 duplicates in 456 s, max latency difference of subsequent messages 657 ms

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

lhotari avatar Aug 26 '24 13:08 lhotari