SangBin Cho
SangBin Cho
@KuntaiDu this is ready to review?
I will assign a core label.
I will report the concrete progress by eod today
https://github.com/ray-project/ray/pull/40498
@jsdir @shixianc @valtab can you guys tell me more details about the setup? Is it same as the issue here? (you have an intermediate router that's just having async responses)?
We made some progress in the master though the throughput is not as good as gRPC streaming. Btw, @edoakes do we automatically batch requests in the server layer? Maybe we...
@alexeykudinkin should we close this?
cc @kevin85421 can you take a look?
who's going to take this task?
> Can you elaborate on why you think placing the guided decoding parameters in the SamplingParams is a good idea? As I commented in https://github.com/vllm-project/vllm/pull/4130, I think they conceptually overlap...