nest icon indicating copy to clipboard operation
nest copied to clipboard

Kafka rebalancing hangs when using Message pattern in microservice

Open smuschevich opened this issue 2 years ago • 2 comments
trafficstars

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current behavior

Consumer is not assigned to a topic partition after rollout in k8s.

Precondition: it’s necessary to have 2+ replicas of microservice with Kafka client.

Minimum reproduction code

https://github.com/nestjs/nest/blob/2d13d081d8f0bf7fe592a7d2762baf568403803c/packages/microservices/helpers/kafka-reply-partition-assigner.ts#L123

Steps to reproduce

  1. Create a nest microservice with ClientKafka configured to communicate via MessagePattern
  2. Create a topic with 2 partitions in Kafka
  3. Subscribe to the topic via subscribeToResponseOf() method
  4. Run two instances of the microservice
  5. Nest will create a “reply” topic for receiving responses (increase the number of partitions to 2 if the reply topic has only one)
  6. Consumer of the first microservice is assigned to the partition 0, and consumer of the second one to the partition 1
  7. Run one more microservice, Nest will leave its consumer without assigned partition (because we have just 2)
  8. Shutdown the first microservice
  9. Nest will start rebalancing using KafkaReplyPartitionAssigner

Result: the consumer of the second microservice is assigned to two partitions, and the consumer of third one to zero partitions. Because of that the rebalancing is launched continuously after rebalanceTimeout period.

The reason of this behavior is in the logic which tries to retain previous assignments. The assignment of the 3rd consumer comes with previous value null and Nest successfully re-assigns it again to this consumer.

As far as I understand it’s necessary to improve the condition in this line if (assignment[assignee][topic].length === 0) { to check additionally for null assignments. https://github.com/nestjs/nest/blob/2d13d081d8f0bf7fe592a7d2762baf568403803c/packages/microservices/helpers/kafka-reply-partition-assigner.ts#L123

Expected behavior

All consumers are assigned to at least one partition after the rebalancing.

Package

  • [ ] I don't know. Or some 3rd-party package
  • [ ] @nestjs/common
  • [ ] @nestjs/core
  • [X] @nestjs/microservices
  • [ ] @nestjs/platform-express
  • [ ] @nestjs/platform-fastify
  • [ ] @nestjs/platform-socket.io
  • [ ] @nestjs/platform-ws
  • [ ] @nestjs/testing
  • [ ] @nestjs/websockets
  • [ ] Other (see below)

Other package

No response

NestJS version

9 (but I guess it’s actual for 10th as well)

Packages versions


Node.js version

No response

In which operating systems have you tested?

  • [ ] macOS
  • [ ] Windows
  • [X] Linux

Other

No response

smuschevich avatar Sep 06 '23 16:09 smuschevich

Would you like to create a PR for this issue?

kamilmysliwiec avatar Sep 11 '23 08:09 kamilmysliwiec

Would the issue be fixed if the assignment[assignee][topic] list is filtered out in order to not include null values?

jaime-amate avatar Oct 29 '23 17:10 jaime-amate