pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve] Do not process acks in the Netty thread

Open dlg99 opened this issue 1 year ago • 4 comments

Motivation

Cherry-picked from internal @eolivelli work. Do not block netty threads if cursor has a lot of individuallyDeletedMessages or needs to compress the PositionInfo (separate change)

Modifications

Run acks on pulsar executor

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

NO

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/dlg99/pulsar/pull/16

dlg99 avatar May 28 '24 23:05 dlg99

Do not block netty threads if cursor has a lot of individuallyDeletedMessages or needs to compress the PositionInfo (separate change)

@dlg99 @eolivelli Do you have any performance results to share? What is the improvement when this PR is applied? In what type of use cases does this have an impact?

lhotari avatar May 29 '24 06:05 lhotari

I don't have numbers to share because we did this work in the scope of a case in which an application generates huge lists of individuallyDeletedMessages ranges (millions).

We have other patches to share that will allow compressing the cursor PositionInfo while writing to BookKeeper as well (and also chunk the payload if it doesn't fit 5MB). We also have some other patches to improve the memory overhead caused by Protobuf (it generates tons of garbage on the heap)

As the serialization happens when you call "subscription.acknowledge(xx)" you can see the netty thread blocked for milliseconds.

eolivelli avatar May 29 '24 07:05 eolivelli

@lhotari @eolivelli any objections to merging this? (PR needs an approval)

dlg99 avatar May 29 '24 17:05 dlg99

@lhotari @eolivelli any objections to merging this? (PR needs an approval)

I do have concerns. With asynchronous methods, there's usually a problem with back pressure. (another example: https://github.com/apache/pulsar/pull/22541#issuecomment-2071568113) Perhaps it's not a real issue for acknowledgements. Has this solution been proven with heavy testing and/or production usage?

lhotari avatar May 29 '24 17:05 lhotari