Woosuk Kwon
Woosuk Kwon
@gshtras Can you please provide a concrete number (how much improvements)?
@zhuohan123 Could you please take a look at this? This PR is pretty interesting.
@njhill I believe this can be related to #3466 ? Could you also take a look?
Hi @xinji1 Thanks for the detailed RFC and the PR, and sorry for the late reply. If I understand correctly, the speedup comes from: 1. Sorting the requests in the...
yeah let me fix the error The PR passed the pre-commit test in CI somehow.
@tlrmchlsmth Just wondering: Do you have any accuracy or performance benchmark numbers?
@njhill Good question. Actually, the MP backend would also work for TPUs. However, I think users such as GKE prefer Ray, because 1) they are interested in multi-host inference (which...
@jikunshang It seems like using IPEX somehow changes the output of the models and thus fails the model tests. Could you please take a look?
@markmc Is this PR waiting for review? Or is it in progress?
@markmc Can you please merge from main again?