SangBin Cho

Results 409 comments of SangBin Cho

@jsdir @shixianc @valtab can you guys tell me more details about the setup? Is it same as the issue here? (you have an intermediate router that's just having async responses)?

We made some progress in the master though the throughput is not as good as gRPC streaming. Btw, @edoakes do we automatically batch requests in the server layer? Maybe we...

cc @kevin85421 can you take a look?

> Can you elaborate on why you think placing the guided decoding parameters in the SamplingParams is a good idea? As I commented in https://github.com/vllm-project/vllm/pull/4130, I think they conceptually overlap...