kafka-connect-reddit icon indicating copy to clipboard operation
kafka-connect-reddit copied to clipboard

Extremely high CPU load

Open yupm opened this issue 5 years ago • 4 comments

Can you do a "docker stats" and have a look at the load? I am getting 200% CPU load at the start and 100% after a few minutes. This is from polling a few subreddits.

yupm avatar Mar 28 '20 16:03 yupm

I'm also seeing this issue.

monksy avatar Apr 20 '20 16:04 monksy

What does the image you're using to run the connector look like? Is it based on https://github.com/C0urante/kafka-connect-reddit/pull/12?

Also, where are Kafka and ZooKeeper being run?

C0urante avatar Apr 27 '20 03:04 C0urante

I'm investigating some of the CPU stats, seem a little high for me as well. That being said big fan of this build so far (I happen to be running on a hortonworks hadoop cluster).

GabeChurch avatar Feb 07 '21 03:02 GabeChurch

Update: this is likely caused by the connector polling from Reddit asynchronously on separate threads and returning immediately from RedditSourceTask::poll, even when there are no records available. The thread from Connect that calls poll enters a tight loop where each iteration is essentially a no-op.

The first solution that comes to mind is to introduce some artificial latency here by waiting in poll for a small period (say, 1-5 seconds) for records to become available when there are none. We still want to periodically return something to Connect since stop will be invoked on that same thread; if the connector is reading from extremely low-throughput subreddits, its tasks to become zombies if they don't return even when some external event such as a rebalance or a user reconfiguration requires that they be shut down.

C0urante avatar May 24 '21 01:05 C0urante