nest
nest copied to clipboard
Kafka rebalancing hangs when using Message pattern in microservice
Is there an existing issue for this?
- [X] I have searched the existing issues
Current behavior
Consumer is not assigned to a topic partition after rollout in k8s.
Precondition: it’s necessary to have 2+ replicas of microservice with Kafka client.
Minimum reproduction code
https://github.com/nestjs/nest/blob/2d13d081d8f0bf7fe592a7d2762baf568403803c/packages/microservices/helpers/kafka-reply-partition-assigner.ts#L123
Steps to reproduce
- Create a nest microservice with ClientKafka configured to communicate via MessagePattern
- Create a topic with 2 partitions in Kafka
- Subscribe to the topic via
subscribeToResponseOf()method - Run two instances of the microservice
- Nest will create a “reply” topic for receiving responses (increase the number of partitions to 2 if the reply topic has only one)
- Consumer of the first microservice is assigned to the partition 0, and consumer of the second one to the partition 1
- Run one more microservice, Nest will leave its consumer without assigned partition (because we have just 2)
- Shutdown the first microservice
- Nest will start rebalancing using
KafkaReplyPartitionAssigner
Result: the consumer of the second microservice is assigned to two partitions, and the consumer of third one to zero partitions. Because of that the rebalancing is launched continuously after rebalanceTimeout period.
The reason of this behavior is in the logic which tries to retain previous assignments. The assignment of the 3rd consumer comes with previous value null and Nest successfully re-assigns it again to this consumer.
As far as I understand it’s necessary to improve the condition in this line if (assignment[assignee][topic].length === 0) { to check additionally for null assignments.
https://github.com/nestjs/nest/blob/2d13d081d8f0bf7fe592a7d2762baf568403803c/packages/microservices/helpers/kafka-reply-partition-assigner.ts#L123
Expected behavior
All consumers are assigned to at least one partition after the rebalancing.
Package
- [ ] I don't know. Or some 3rd-party package
- [ ]
@nestjs/common - [ ]
@nestjs/core - [X]
@nestjs/microservices - [ ]
@nestjs/platform-express - [ ]
@nestjs/platform-fastify - [ ]
@nestjs/platform-socket.io - [ ]
@nestjs/platform-ws - [ ]
@nestjs/testing - [ ]
@nestjs/websockets - [ ] Other (see below)
Other package
No response
NestJS version
9 (but I guess it’s actual for 10th as well)
Packages versions
Node.js version
No response
In which operating systems have you tested?
- [ ] macOS
- [ ] Windows
- [X] Linux
Other
No response
Would you like to create a PR for this issue?
Would the issue be fixed if the assignment[assignee][topic] list is filtered out in order to not include null values?