Extremely high CPU load
Can you do a "docker stats" and have a look at the load? I am getting 200% CPU load at the start and 100% after a few minutes. This is from polling a few subreddits.
I'm also seeing this issue.
What does the image you're using to run the connector look like? Is it based on https://github.com/C0urante/kafka-connect-reddit/pull/12?
Also, where are Kafka and ZooKeeper being run?
I'm investigating some of the CPU stats, seem a little high for me as well. That being said big fan of this build so far (I happen to be running on a hortonworks hadoop cluster).
Update: this is likely caused by the connector polling from Reddit asynchronously on separate threads and returning immediately from RedditSourceTask::poll, even when there are no records available. The thread from Connect that calls poll enters a tight loop where each iteration is essentially a no-op.
The first solution that comes to mind is to introduce some artificial latency here by waiting in poll for a small period (say, 1-5 seconds) for records to become available when there are none. We still want to periodically return something to Connect since stop will be invoked on that same thread; if the connector is reading from extremely low-throughput subreddits, its tasks to become zombies if they don't return even when some external event such as a rebalance or a user reconfiguration requires that they be shut down.