pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][client]Reduce CPU usage when client idle and batch message enable

Open stillerrr opened this issue 1 year ago • 8 comments

Reduce CPU usage when client idle and batch message enable

Fixes #23187

Motivation

ProducerImple use EventLoopGroup to send batch message per 1ms by default, it would cause about 14% CPU usage when client is idle and no messaage producing

Modifications

Use JDK ScheduleExecutorService to do this schedule task

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [X] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [X] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository:

stillerrr avatar Aug 17 '24 02:08 stillerrr

@stillerrr Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

github-actions[bot] avatar Aug 17 '24 02:08 github-actions[bot]

@poorbarcode @liangyepianzhou @lhotari PTAL

stillerrr avatar Aug 20 '24 03:08 stillerrr

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

stillerrr avatar Aug 20 '24 08:08 stillerrr

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

Reduce CPU usage when client idle and batch message enable

I don't see how the executor type for the batchFlushTask could be the source of high CPU when the client is idling. The batchFlushTask is a one-time task and it will only be scheduled when there's an ongoing batch.

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread image

I find this thread "pulsar-client-io" is created by EventLoopGroup instance in PulsarClientImpl image image image

this EventLoopGroup instance would be used in ClientCnx and ProducerImpl would use it to send batch message per 1ms by default image image image

I check all the place using this EventLoopGroup instance, and find when I replace cnx.ctx().executor() with a ScheduledExecutorService instance in scheduleBatchFlushTask() method, the cpu usage is less than 1%, here is my test result image

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

hanmz avatar Aug 23 '24 03:08 hanmz

Is it because ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule?

@hanmz In this case, there shouldn't be any active tasks if the client is idle. please check my comment https://github.com/apache/pulsar/pull/23188#pullrequestreview-2247309975 . That's why this PR doesn't make sense to me.

lhotari avatar Aug 23 '24 04:08 lhotari

In my test environment, init puslar client, create producers for 100 topics and send some messages to trigger initialization. There would be 14% CPU usage because pulsar-client-io thread

@stillerrr Have you tried profiling with Async Profiler to find out what's going on?

lhotari avatar Aug 23 '24 04:08 lhotari

I think the root cause is EventLoopGroup cannot be idle due to frequent sending task

@stillerrr There shouldn't be any sending tasks at all when the client is idle. please see my previous comment. That's why it's necessary to profile this with Async Profiler.

lhotari avatar Aug 23 '24 04:08 lhotari

@hanmz @lhotari The point thing is ScheduledExecutorService.schedule has lower cpu usage than EventExecutorGroup.schedule. Once batch sending enable, create a cause that sending task 1000 times per 1s, ScheduledExecutorService.schedule woule use less 1% CPU, but EventExecutorGroup.schedule use 14% CPU

In version 2.10.4, pulsar producer do scheduled task once 1ms, the result of cpu usage is what I sent
14% for EventExecutorGroup.schedule image

less than 1% for ScheduledExecutorService.schedule image

stillerrr avatar Aug 26 '24 13:08 stillerrr

Reduce CPU usage when client idle and batch message enable

The description is currently misleading. Re: https://github.com/apache/pulsar/issues/23187#issuecomment-2326291144

lhotari avatar Sep 03 '24 11:09 lhotari