pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve] Handle PositionInfo that's too large to serialize as a single entry

Open dlg99 opened this issue 1 year ago • 4 comments
trafficstars

Motivation

In some cases cursor position info can be too large to serialize as a single entry, e.g. in case of too many deleted ranges. Also the serialization can be too slow.

cherry-picks of changes by @eolivelli @nicoloboschi and I.

Modifications

Cursor PositionInfo serialization is reworked to produce less garbage/serialize faster; serialized data can be compressed. In case the serialized data too large it is chunked and saved as a sequence of entries.

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

This change added tests.

Does this pull request potentially affect one of the following parts:

NO

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/dlg99/pulsar/pull/17

dlg99 avatar May 29 '24 23:05 dlg99

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries

There was a larger PR for PIP-81 that was closed: #10729 Some parts of it were split, such as #15425 and #15607. @codelipenghui @315157973 any details about future PIP-81 plans to share?

lhotari avatar Aug 16 '24 10:08 lhotari

Btw. I'm currently investigating a Key_Shared subscription type issue where ordinary consumption of message leads to a very large number of "ack holes". The WIP test app where this is reproduced is https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioIssueKeyShared.java . The test class is not yet simplified to contain the relevant parts. I started with a very complex test case and it seems that the "ack hole" problem shows up in all possible cases.

No messages get lost. It's just that some messages don't get delivered until all other messages have been processed.

lhotari avatar Aug 16 '24 10:08 lhotari

In the 1M message experiment, the number of ack holes goes down from about 150k ack holes to <500 with this experiment: https://github.com/lhotari/pulsar/commit/a3b06391fc63635e7a2b86ed51652cde4b51c117

lhotari avatar Aug 16 '24 13:08 lhotari

It's possible that the root cause of this issue of large PositionInfo is #23200 and it is addressed with PRs #23231 and #23226. There's #23224 for observability. Large msgInReplay counts would confirm the root cause.

lhotari avatar Aug 27 '24 13:08 lhotari

this is a real problem and it has been solved with a simple and fundamentally proven solution with perf numbers : https://github.com/apache/pulsar/pull/9292

But again I am not sure some folks blocked this PR without saying the reason even after asking multiple times and blocked the progress on this PR.

rdhabalia avatar Sep 20 '24 23:09 rdhabalia

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries

There was a larger PR for PIP-81 that was closed: #10729 Some parts of it were split, such as #15425 and #15607. @codelipenghui @315157973 any details about future PIP-81 plans to share?

Since a PR implemented the compression of PositionInfo, the size of PositionInfo can be greatly reduced, and the problem of Entry size exceeding the threshold will no longer occur, so this PIP was not further promoted.

315157973 avatar Sep 21 '24 10:09 315157973

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries There was a larger PR for PIP-81 that was closed: #10729 Some parts of it were split, such as #15425 and #15607. @codelipenghui @315157973 any details about future PIP-81 plans to share?

Since a PR implemented the compression of PositionInfo, the size of PositionInfo can be greatly reduced, and the problem of Entry size exceeding the threshold will no longer occur, so this PIP was not further promoted.

@315157973 Are you referring to PIP-146: ManagedCursorInfo compression or ManagedLedgerInfo compression (does that contain a PIP?) ? What if the size exceeds the threshold after compression?

lhotari avatar Sep 23 '24 06:09 lhotari

Codecov Report

Attention: Patch coverage is 38.12317% with 844 lines in your changes missing coverage. Please review.

Project coverage is 74.19%. Comparing base (bbc6224) to head (54157d8). Report is 629 commits behind head on master.

Files with missing lines Patch % Lines
...che/bookkeeper/mledger/impl/PositionInfoUtils.java 28.24% 681 Missing and 20 partials :warning:
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 61.20% 110 Missing and 25 partials :warning:
.../apache/bookkeeper/mledger/impl/MetaStoreImpl.java 60.00% 4 Missing and 2 partials :warning:
...e/bookkeeper/mledger/impl/LedgerMetadataUtils.java 50.00% 1 Missing and 1 partial :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22799      +/-   ##
============================================
+ Coverage     73.57%   74.19%   +0.61%     
- Complexity    32624    34041    +1417     
============================================
  Files          1877     1937      +60     
  Lines        139502   146587    +7085     
  Branches      15299    16087     +788     
============================================
+ Hits         102638   108756    +6118     
- Misses        28908    29472     +564     
- Partials       7956     8359     +403     
Flag Coverage Δ
inttests 27.41% <20.60%> (+2.83%) :arrow_up:
systests 24.42% <19.94%> (+0.09%) :arrow_up:
unittests 73.56% <38.12%> (+0.71%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...apache/bookkeeper/mledger/ManagedLedgerConfig.java 96.55% <100.00%> (+0.25%) :arrow_up:
...bookkeeper/mledger/ManagedLedgerFactoryConfig.java 90.47% <100.00%> (+1.00%) :arrow_up:
...org/apache/pulsar/broker/ServiceConfiguration.java 98.97% <100.00%> (-0.43%) :arrow_down:
...ache/pulsar/broker/ManagedLedgerClientFactory.java 90.24% <100.00%> (+6.09%) :arrow_up:
...rg/apache/pulsar/broker/service/BrokerService.java 83.61% <100.00%> (+2.83%) :arrow_up:
...e/bookkeeper/mledger/impl/LedgerMetadataUtils.java 91.66% <50.00%> (-8.34%) :arrow_down:
.../apache/bookkeeper/mledger/impl/MetaStoreImpl.java 83.72% <60.00%> (-2.19%) :arrow_down:
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 76.73% <61.20%> (-2.57%) :arrow_down:
...che/bookkeeper/mledger/impl/PositionInfoUtils.java 28.24% <28.24%> (ø)

... and 603 files with indirect coverage changes

codecov-commenter avatar Sep 24 '24 01:09 codecov-commenter

@lhotari I got rid of byte[] usage and added a feature flag, per your comments

dlg99 avatar Sep 26 '24 17:09 dlg99

overall let us organize code more appropriately and I am not comfortable with such heavy implementation for a simple feature and it should go to separate interface implementation. I will not block this PR but let's not rush to merge PR and please do not merge this PR without addressing such fundamental separation to avoid creating a messy code base.

rdhabalia avatar Oct 03 '24 19:10 rdhabalia