[improve][client]Reduce CPU usage when client idle and batch message enable
Reduce CPU usage when client idle and batch message enable
Fixes #23187
Motivation
ProducerImple use EventLoopGroup to send batch message per 1ms by default, it would cause about 14% CPU usage when client is idle and no messaage producing
Modifications
Use JDK ScheduleExecutorService to do this schedule task
Verifying this change
- [ ] Make sure that the change passes the CI checks.
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
- Added integration tests for end-to-end deployment with large payloads (10MB)
- Extended integration test for recovery after broker failure
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [X] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
Documentation
- [ ]
doc - [ ]
doc-required - [X]
doc-not-needed - [ ]
doc-complete
Matching PR in forked repository
PR in forked repository:
@stillerrr Please add the following content to your PR description and select a checkbox:
- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
@poorbarcode @liangyepianzhou @lhotari PTAL
Reduce CPU usage when client idle and batch message enable
I don't see how the executor type for the
batchFlushTaskcould be the source of high CPU when the client is idling. ThebatchFlushTaskis a one-time task and it will only be scheduled when there's an ongoing batch.
In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl
this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default
I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result
I think the root cause is EventLoopGroup cannot be idle due to frequent sending task
Reduce CPU usage when client idle and batch message enable
I don't see how the executor type for the
batchFlushTaskcould be the source of high CPU when the client is idling. ThebatchFlushTaskis a one-time task and it will only be scheduled when there's an ongoing batch.In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
I find this thread "pulsar-client-io" is created by
EventLoopGroupinstance inPulsarClientImpl![]()
![]()
this
EventLoopGroupinstance would be used inClientCnxandProducerImplwould use it to send batch message per 1ms by default![]()
![]()
I check all the place using this
EventLoopGroupinstance, and find when I replacecnx.ctx().executor()with aScheduledExecutorServiceinstance inscheduleBatchFlushTask()method, the cpu usage is less than 1%, here is my test resultI think the root cause is
EventLoopGroupcannot be idle due to frequent sending task
Reduce CPU usage when client idle and batch message enable
I don't see how the executor type for the
batchFlushTaskcould be the source of high CPU when the client is idling. ThebatchFlushTaskis a one-time task and it will only be scheduled when there's an ongoing batch.In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
I find this thread "pulsar-client-io" is created by
EventLoopGroupinstance inPulsarClientImpl![]()
![]()
this
EventLoopGroupinstance would be used inClientCnxandProducerImplwould use it to send batch message per 1ms by default![]()
![]()
I check all the place using this
EventLoopGroupinstance, and find when I replacecnx.ctx().executor()with aScheduledExecutorServiceinstance inscheduleBatchFlushTask()method, the cpu usage is less than 1%, here is my test resultI think the root cause is
EventLoopGroupcannot be idle due to frequent sending task
Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?
Reduce CPU usage when client idle and batch message enable
I don't see how the executor type for the
batchFlushTaskcould be the source of high CPU when the client is idling. ThebatchFlushTaskis a one-time task and it will only be scheduled when there's an ongoing batch.In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
I find this thread "pulsar-client-io" is created by
EventLoopGroupinstance inPulsarClientImpl![]()
![]()
this
EventLoopGroupinstance would be used inClientCnxandProducerImplwould use it to send batch message per 1ms by default![]()
![]()
I check all the place using this
EventLoopGroupinstance, and find when I replacecnx.ctx().executor()with aScheduledExecutorServiceinstance inscheduleBatchFlushTask()method, the cpu usage is less than 1%, here is my test resultI think the root cause is
EventLoopGroupcannot be idle due to frequent sending task
Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?
Is it because
ScheduledExecutorService.schedulehas lower cpu usage thanEventExecutorGroup.schedule?
@hanmz In this case, there shouldn't be any active tasks if the client is idle. please check my comment https://github.com/apache/pulsar/pull/23188#pullrequestreview-2247309975 . That's why this PR doesn't make sense to me.
In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread
@stillerrr Have you tried profiling with Async Profiler to find out what's going on?
I think the root cause is
EventLoopGroupcannot be idle due to frequent sending task
@stillerrr There shouldn't be any sending tasks at all when the client is idle. please see my previous comment. That's why it's necessary to profile this with Async Profiler.
@hanmz @lhotari The point thing is ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule.
Once batch sending enable, create a cause that sending task 1000 times per 1s, ScheduledExecutorService.schedule woule use less 1% CPU, but EventExecutorGroup.schedule use 14% CPU
In version 2.10.4, pulsar producer do scheduled task once 1ms, the result of cpu usage is what I sent
14% for EventExecutorGroup.schedule
less than 1% for ScheduledExecutorService.schedule
Reduce CPU usage when client idle and batch message enable
The description is currently misleading. Re: https://github.com/apache/pulsar/issues/23187#issuecomment-2326291144











