node-rdkafka icon indicating copy to clipboard operation
node-rdkafka copied to clipboard

Cooperative Rebalance

Open serj026 opened this issue 1 year ago • 5 comments

The purpose of these changes is to add a functional version of node-rdkafka that supports incremental cooperative rebalancing. Given that librdkafka has supported this type of rebalancing for nearly four years, this update aims to bring node-rdkafka in line with librdkafka capabilities and enhancing its functionality.

We use the code from this fork in our production environment for services that handle 5000 requests/messages per second. Using the cooperative-sticky partitioning strategy has shown good results by avoiding "stop-the-world" situations. Specifically, we have observed increased throughput and reduced spikes during scale-up and scale-down operations of pods in Kubernetes.

serj026 avatar Jun 17 '24 10:06 serj026

@GaryWilber Any way we could get some eyes on cooperative rebalancing? :) This would be huge for us.

SeanReece avatar Aug 13 '24 15:08 SeanReece

@iradul Do you have access to trigger the PR test checks/workflow for this? I don't see a way to trigger it myself.

GaryWilber avatar Aug 15 '24 16:08 GaryWilber

@SeanReece @GaryWilber @iradul Guys, how can we escalate the testing and merging process of this PR? It would be very useful for the whole community

neuralspin avatar Aug 28 '24 08:08 neuralspin

@SeanReece @GaryWilber @iradul Pls help to finish this enhancement

neuralspin avatar Oct 15 '24 21:10 neuralspin

FWIW We've decided to move to https://github.com/confluentinc/confluent-kafka-javascript which supports cooperative rebalance and provides a node-rdkafka compatible API.

SeanReece avatar Oct 16 '24 19:10 SeanReece

@serj026, from my understanding, the only thing preventing this PR from being merged is that the automatic build hasn’t started. I believe this is because the PR/commits were created before the system was fully in place.

Could you try adding an empty commit or recreating the PR with the same code? This should re-trigger the build.

wiktor-obrebski avatar Nov 14 '24 13:11 wiktor-obrebski

@SeanReece We tried using confluent-kafka-javascript when the code appeared in master, but encountered a potential memory leak during our e2e tests. We have our own wrapper for node-rdkafka, and in some tests, we observed that memory usage increased to as much as 5.5GB, compared to node-rdkafka, where utilization was around ~580MB.

serj026 avatar Nov 18 '24 12:11 serj026

Looks like it did not trigger the build. I have no idea why it did not help, as it looks like other PRs do not have same issue.

CI trigger definition: https://github.com/Blizzard/node-rdkafka/blob/master/.github/workflows/test.yml

wiktor-obrebski avatar Nov 18 '24 13:11 wiktor-obrebski

Ok, I get confused. It looks like we need "workflow approval" (something other than PR approve) just to run the CI. @GaryWilber @iradul Any chance you can help here and trigger the CI?

wiktor-obrebski avatar Nov 18 '24 13:11 wiktor-obrebski

Thank you!

wiktor-obrebski avatar Nov 19 '24 11:11 wiktor-obrebski