SangBin Cho

Results 292 comments of SangBin Cho

can you try merge the latest master? I saw it sometimes happens... not sure what's the root cause

Thanks for the quality review! I will finish addressing comments by eod today :)!

Update: Swapping is currently only for decoding, so I think decoupling should not be a scope of this PR. But I can address this better in the next PR

Here's the e2e PR for scheduler on top of this branch; https://github.com/rkooo567/vllm/pull/15

Addressed comments relevant to this PR https://github.com/rkooo567/vllm/pull/15#issuecomment-2021809284. cc @simon-mo to take another look!

Status: will separate out swapping and add a lot more unit tests.

before: Throughput: 2.01 requests/s, 972.94 tokens/s after: Throughput: 1.99 requests/s, 961.40 tokens/s Benchmark result. I'd say it is just the same

@simon-mo Updated (plz take a look one more time); - swap is a separate API now - each API is more thoroughly tested. Also better unit testing for swapping. -...

@zhuohan123 @simon-mo As we discussed offline, I updated code based on the proposal I made. - Made all _schedule APIs as stateless as possible. - Each APIs can be used...