Cooperative Rebalance
The purpose of these changes is to add a functional version of node-rdkafka that supports incremental cooperative rebalancing. Given that librdkafka has supported this type of rebalancing for nearly four years, this update aims to bring node-rdkafka in line with librdkafka capabilities and enhancing its functionality.
We use the code from this fork in our production environment for services that handle 5000 requests/messages per second. Using the cooperative-sticky partitioning strategy has shown good results by avoiding "stop-the-world" situations. Specifically, we have observed increased throughput and reduced spikes during scale-up and scale-down operations of pods in Kubernetes.
@GaryWilber Any way we could get some eyes on cooperative rebalancing? :) This would be huge for us.
@iradul Do you have access to trigger the PR test checks/workflow for this? I don't see a way to trigger it myself.
@SeanReece @GaryWilber @iradul Guys, how can we escalate the testing and merging process of this PR? It would be very useful for the whole community
@SeanReece @GaryWilber @iradul Pls help to finish this enhancement
FWIW We've decided to move to https://github.com/confluentinc/confluent-kafka-javascript which supports cooperative rebalance and provides a node-rdkafka compatible API.
@serj026, from my understanding, the only thing preventing this PR from being merged is that the automatic build hasn’t started. I believe this is because the PR/commits were created before the system was fully in place.
Could you try adding an empty commit or recreating the PR with the same code? This should re-trigger the build.
@SeanReece We tried using confluent-kafka-javascript when the code appeared in master, but encountered a potential memory leak during our e2e tests. We have our own wrapper for node-rdkafka, and in some tests, we observed that memory usage increased to as much as 5.5GB, compared to node-rdkafka, where utilization was around ~580MB.
Looks like it did not trigger the build. I have no idea why it did not help, as it looks like other PRs do not have same issue.
CI trigger definition: https://github.com/Blizzard/node-rdkafka/blob/master/.github/workflows/test.yml
Ok, I get confused. It looks like we need "workflow approval" (something other than PR approve) just to run the CI. @GaryWilber @iradul Any chance you can help here and trigger the CI?
Thank you!