FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

update queue_lens on generation ends.

Open lcw99 opened this issue 1 year ago • 0 comments
trafficstars

When using the SHORTEST_QUEUE method for load balancing, an issue arises because the queue_length isn't updated when a generation task ends. This leads to inaccurate load balancing across worker processes. To address this problem, I implemented two improvements:

  1. Added a heartbeat signal that's sent when a generation task completes, ensuring the queue length is updated accurately.
  2. Introduced a random selection process for breaking ties when multiple queues have the same length.

These changes should result in more effective and balanced distribution of work across all available worker processes.

lcw99 avatar Aug 22 '24 22:08 lcw99