ray icon indicating copy to clipboard operation
ray copied to clipboard

[Serve] Make batching work with multiplexing

Open abrarsheikh opened this issue 3 weeks ago • 1 comments

fixes https://github.com/ray-project/ray/issues/56633

  • [x] Add documentation
  • [x] update get_multiplexed_model_id to see if we are batch context first
  • [x] update logic
  • [x] add tests
  • [x] does not introduce any backwards incompatibility, previously the system did not provide any guarantee about contents of a batch and now we are add a constraint that guarantees each batch contains requests for same model.

The thing I dislike about this implementation is that it does not fill the batch in the case where the replica is responsible for > 2 models and incoming traffic is equally distributed between those models. Becasue the current implementation fills the batch first, then divides them.

abrarsheikh avatar Dec 10 '25 04:12 abrarsheikh

[!WARNING] You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist[bot] avatar Dec 10 '25 04:12 gemini-code-assist[bot]