pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[feat][pip] PIP-448: Topic-level Delayed Message Tracker for Memory Optimization

Open Denovo1998 opened this issue 2 months ago • 3 comments

Main Issue: #24600

Motivation

The primary motivation for this proposal is to address the high memory consumption caused by the current per-subscription delayed message tracking mechanism. For topics with hundreds or thousands of subscriptions, the memory footprint for delayed messages becomes prohibitively large. Each delayed message's position is duplicated across every subscription's tracker, leading to a memory usage pattern of O(num_delayed_messages * num_subscriptions).

This excessive memory usage can cause:

  • Increased memory pressure on Pulsar brokers.
  • More frequent and longer Garbage Collection (GC) pauses, impacting broker performance.
  • Potential OutOfMemoryErrors, leading to broker instability.
  • Limited scalability for use cases that rely on many subscriptions per topic, such as IoT or large-scale microservices with shared subscriptions.

By optimizing the delayed message tracking to be more memory-efficient, we can enhance broker stability and scalability, allowing Pulsar to better support these critical use cases.

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Denovo1998 avatar Nov 01 '25 12:11 Denovo1998

I think this is a good improvement, but based on this improvement, is there a test or benchmark that can visually demonstrate the results?

Technoboy- avatar Dec 02 '25 01:12 Technoboy-

Yes, as in https://github.com/apache/pulsar/pull/24739, a JMH benchmarks is needed:

https://github.com/Denovo1998/pulsar/blob/738d0e2c41cdf7cc727789f348ebeda4089b9353/microbench/src/main/java/org/apache/pulsar/broker/delayed/bucket/BucketDelayedDeliveryTrackerBenchmark.java

Denovo1998 avatar Dec 03 '25 13:12 Denovo1998

I recently made some simple adjustments to OpenMessaging Benchmark to support the delayed message feature, while testing many PRs related to the delayed message module.

https://github.com/apache/pulsar/issues/24600#issuecomment-3591687762

For this PR, we must first monitor memory, but the most important thing is to test whether a problem with one subscription under a large number of subscriptions affects other subscriptions.

Any good suggestions for the result comparison part? Which indicators should we visualize? @Technoboy- @codelipenghui @lhotari @coderzc

Denovo1998 avatar Dec 03 '25 13:12 Denovo1998