pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

Event time based compaction

Open marekczajkowski opened this issue 1 year ago • 5 comments

Draft with Event time based topic compaction

marekczajkowski avatar Apr 16 '24 12:04 marekczajkowski

@marekczajkowski Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

github-actions[bot] avatar Apr 16 '24 12:04 github-actions[bot]

One of the challenges with event time order compacting is that it would require some additional configuration and logic for the acceptable time skew (late arrival of events). @marekczajkowski Any thoughts on how to solve that challenge?

lhotari avatar May 15 '24 15:05 lhotari

One of the challenges with event time order compacting is that it would require some additional configuration and logic for the acceptable time skew (late arrival of events). @marekczajkowski Any thoughts on how to solve that challenge?

I might be overthinking this. Perhaps it doesn't matter that much. :)

lhotari avatar May 15 '24 16:05 lhotari

One of the challenges with event time order compacting is that it would require some additional configuration and logic for the acceptable time skew (late arrival of events). @marekczajkowski Any thoughts on how to solve that challenge?

Can you explain what you mean by that ? I don't quite get it

marekczajkowski avatar May 20 '24 06:05 marekczajkowski

One of the challenges with event time order compacting is that it would require some additional configuration and logic for the acceptable time skew (late arrival of events). @marekczajkowski Any thoughts on how to solve that challenge?

Can you explain what you mean by that ? I don't quite get it

I think I just wasn't thinking through it. It's not a problem since the last message with the most recent event time "wins". It's eventually consistent.

lhotari avatar May 20 '24 06:05 lhotari

@marekczajkowski Do you have tests for the new behavior? How about the solution for configuring the compactor?

lhotari avatar Aug 01 '24 11:08 lhotari

@marekczajkowski Do you have tests for the new behavior? How about the solution for configuring the compactor?

I have added unit tests. As for configuring the compactor there is no new mechanism. It uses existing 'compactionServiceFactoryClassName' property name to switch the factory that creates compactor

marekczajkowski avatar Aug 05 '24 12:08 marekczajkowski

I have added unit tests. As for configuring the compactor there is no new mechanism. It uses existing 'compactionServiceFactoryClassName' property name to switch the factory that creates compactor

Thanks. The PR itself is ok.

Configuring the compactor will require some updates in documentation. It would be great if our documention presents this feature, the purpose of it and shows how to enable it. I guess the minimal approach would be to add the configuration key to broker.conf and standalone.conf and document the configuration key in the comments.

lhotari avatar Aug 13 '24 11:08 lhotari

I have added unit tests. As for configuring the compactor there is no new mechanism. It uses existing 'compactionServiceFactoryClassName' property name to switch the factory that creates compactor

Thanks. The PR itself is ok.

Configuring the compactor will require some updates in documentation. It would be great if our documention presents this feature, the purpose of it and shows how to enable it. I guess the minimal approach would be to add the configuration key to broker.conf and standalone.conf and document the configuration key in the comments.

Sure. Both configs updated with documenting comments

marekczajkowski avatar Aug 19 '24 07:08 marekczajkowski

/pulsarbot rerun-failure-checks

marekczajkowski avatar Aug 23 '24 13:08 marekczajkowski

Codecov Report

Attention: Patch coverage is 70.55394% with 101 lines in your changes missing coverage. Please review.

Project coverage is 74.53%. Comparing base (bbc6224) to head (e0a1c70). Report is 543 commits behind head on master.

Files Patch % Lines
...e/pulsar/compaction/AbstractTwoPhaseCompactor.java 76.88% 37 Missing and 12 partials :warning:
...che/pulsar/compaction/EventTimeOrderCompactor.java 36.76% 35 Missing and 8 partials :warning:
.../compaction/EventTimeCompactionServiceFactory.java 0.00% 5 Missing :warning:
...he/pulsar/compaction/PublishingOrderCompactor.java 93.18% 2 Missing and 1 partial :warning:
...g/apache/pulsar/client/impl/RawBatchConverter.java 90.90% 0 Missing and 1 partial :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22517      +/-   ##
============================================
+ Coverage     73.57%   74.53%   +0.96%     
- Complexity    32624    33654    +1030     
============================================
  Files          1877     1924      +47     
  Lines        139502   144548    +5046     
  Branches      15299    15822     +523     
============================================
+ Hits         102638   107743    +5105     
+ Misses        28908    28543     -365     
- Partials       7956     8262     +306     
Flag Coverage Δ
inttests 27.50% <41.98%> (+2.92%) :arrow_up:
systests 24.70% <31.19%> (+0.37%) :arrow_up:
unittests 73.89% <70.26%> (+1.05%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...va/org/apache/pulsar/compaction/CompactorTool.java 68.91% <100.00%> (ø)
...pache/pulsar/compaction/MessageCompactionData.java 100.00% <100.00%> (ø)
...sar/compaction/PulsarCompactionServiceFactory.java 82.60% <100.00%> (ø)
.../pulsar/compaction/StrategicTwoPhaseCompactor.java 73.72% <ø> (-2.97%) :arrow_down:
...g/apache/pulsar/client/impl/RawBatchConverter.java 94.25% <90.90%> (+0.35%) :arrow_up:
...he/pulsar/compaction/PublishingOrderCompactor.java 93.18% <93.18%> (ø)
.../compaction/EventTimeCompactionServiceFactory.java 0.00% <0.00%> (ø)
...che/pulsar/compaction/EventTimeOrderCompactor.java 36.76% <36.76%> (ø)
...e/pulsar/compaction/AbstractTwoPhaseCompactor.java 76.88% <76.88%> (ø)

... and 496 files with indirect coverage changes

codecov-commenter avatar Aug 23 '24 14:08 codecov-commenter