pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][pip] PIP-352: Event time based topic compactor

Open marekczajkowski opened this issue 1 year ago • 6 comments

PIP: 352

Motivation

Currently, there are two types of compactors available: TwoPhaseCompactor and StrategicTwoPhaseCompactor. The latter is specifically utilized for internal load balancing purposes and is not employed for regular compaction of Pulsar topics. On the other hand, the former can be configured via CompactionServiceFactory in the broker.conf.

I believe it could be advantageous to introduce another type of topic compactor that operates based on event time. Such a compactor would have the capability to maintain desired messages within the topic while preserving the order expected by external applications. Although applications may send messages with the current event time, variations in network conditions or redeliveries could result in messages being stored in the Pulsar topic in a different order than intended. Implementing event time-based checks could mitigate this inconvenience.

Modifications

Added PIP

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [x] doc
  • [ ] doc-required
  • [ ] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository:

marekczajkowski avatar May 14 '24 13:05 marekczajkowski

Added a comment about an unsolved challenge: https://github.com/apache/pulsar/pull/22517#issuecomment-2112918186

lhotari avatar May 15 '24 15:05 lhotari

@lhotari what are the next steps to proceed ?

marekczajkowski avatar Jun 03 '24 07:06 marekczajkowski

Added a comment about an unsolved challenge: #22517 (comment)

this has been addressed.

lhotari avatar Jun 19 '24 14:06 lhotari

@lhotari what are the next steps to proceed ?

I've described this in the email response to the discussion thread: https://lists.apache.org/thread/ocrbhlhs049px5w9mz9gfym4wpq4701f

Please start a new vote thread for PIP-352.

lhotari avatar Jun 19 '24 14:06 lhotari

can this be implemented by StrategicTwoPhaseCompactor with another compaction strategy??

heesung-sohn avatar Jun 21 '24 00:06 heesung-sohn

can this be implemented by StrategicTwoPhaseCompactor with another compaction strategy??

Not really StrategicTwoPhaseCompactor is specifically utilized for internal load balancing purposes and is not employed for regular compaction of Pulsar topics

marekczajkowski avatar Jun 21 '24 09:06 marekczajkowski