FastChat
FastChat copied to clipboard
update queue_lens on generation ends.
trafficstars
When using the SHORTEST_QUEUE method for load balancing, an issue arises because the queue_length isn't updated when a generation task ends. This leads to inaccurate load balancing across worker processes. To address this problem, I implemented two improvements:
- Added a heartbeat signal that's sent when a generation task completes, ensuring the queue length is updated accurately.
- Introduced a random selection process for breaking ties when multiple queues have the same length.
These changes should result in more effective and balanced distribution of work across all available worker processes.