ruby-kafka
ruby-kafka copied to clipboard
bottleneck in producing messages?
I'm using newrelic to trace where the slowest part of my code is coming and it's pointing me to the produce
method. I tried using sync and async producers but both are producing slowly. It takes about 1-2 seconds for 1 message to get produced so if I'm producing a batch, sometimes it takes 30 seconds!
I also saw ack.message_producer.kafka
logs and they have delays ranging from 1s to 7s. Is this a bad number? Does this, in anyway, affect the producer's performance? In general I'm getting a throughput of 220K/minute and I want a higher number. Where can I start tracing this kind of "bottleneck"? (if it's really one)
By the way, I am producing in batches before explicitly calling deliver.
did you try switching to rdkafka? we had a similar problem (and couple others), and went with rdkafka for message production. I re-packed waterdrop for this purpose: https://github.com/karafka/waterdrop (version 2.0).
did you try switching to rdkafka? we had a similar problem (and couple others), and went with rdkafka for message production. I re-packed waterdrop for this purpose: https://github.com/karafka/waterdrop (version 2.0).
Hi! Thanks for your response. No, I haven't actually tried other Kafka clients. I'm just curious what you found out and what made you switch? Why do you prefer rdkafka over this gem? What makes it more performant? I can try it too and see for myself.
I'm actually interested to know about the "bottleneck" in the code as I am clueless. I'm also thinking that I might be missing some configuration or something that's why I went here.
@katpadi in order not to spam this thread with other clients concerns, please open an issue in waterdrop, then I can give an answer.
@katpadi it could be that you're maxing out how fast the process can produce to the cluster – this could be due to any of a number of factors.
produce
does not directly cause an API call to Kafka, so it is likely that you're hitting a bound on the internal buffer or queue. Would you be able to run a profiler while this is happening?
Issue has been marked as stale due to a lack of activity.