ruby-kafka icon indicating copy to clipboard operation
ruby-kafka copied to clipboard

bottleneck in producing messages?

Open katpadi opened this issue 4 years ago • 4 comments

I'm using newrelic to trace where the slowest part of my code is coming and it's pointing me to the produce method. I tried using sync and async producers but both are producing slowly. It takes about 1-2 seconds for 1 message to get produced so if I'm producing a batch, sometimes it takes 30 seconds!

I also saw ack.message_producer.kafka logs and they have delays ranging from 1s to 7s. Is this a bad number? Does this, in anyway, affect the producer's performance? In general I'm getting a throughput of 220K/minute and I want a higher number. Where can I start tracing this kind of "bottleneck"? (if it's really one)

By the way, I am producing in batches before explicitly calling deliver.

katpadi avatar Dec 10 '20 07:12 katpadi

did you try switching to rdkafka? we had a similar problem (and couple others), and went with rdkafka for message production. I re-packed waterdrop for this purpose: https://github.com/karafka/waterdrop (version 2.0).

mensfeld avatar Dec 10 '20 09:12 mensfeld

did you try switching to rdkafka? we had a similar problem (and couple others), and went with rdkafka for message production. I re-packed waterdrop for this purpose: https://github.com/karafka/waterdrop (version 2.0).

Hi! Thanks for your response. No, I haven't actually tried other Kafka clients. I'm just curious what you found out and what made you switch? Why do you prefer rdkafka over this gem? What makes it more performant? I can try it too and see for myself.

I'm actually interested to know about the "bottleneck" in the code as I am clueless. I'm also thinking that I might be missing some configuration or something that's why I went here.

katpadi avatar Dec 10 '20 16:12 katpadi

@katpadi in order not to spam this thread with other clients concerns, please open an issue in waterdrop, then I can give an answer.

mensfeld avatar Dec 10 '20 16:12 mensfeld

@katpadi it could be that you're maxing out how fast the process can produce to the cluster – this could be due to any of a number of factors.

produce does not directly cause an API call to Kafka, so it is likely that you're hitting a bound on the internal buffer or queue. Would you be able to run a profiler while this is happening?

dasch avatar Dec 30 '20 15:12 dasch

Issue has been marked as stale due to a lack of activity.

github-actions[bot] avatar Jun 17 '23 00:06 github-actions[bot]