Venkat Raman
Venkat Raman
No worries ! I wanted to be precise & sure, considering earlier discussions in this thread. > the queueing system in TGI is a bit "naive" since it has no...
> Lessons we learned from V1: > > To achieve high GPU utilization, we should care about everything happening on the CPU. > - Python is slow. > > Scheduling...
@varungup90 There was a bug in constructor init of `vtc-basic` router that was not caught bcos of options based init in tests. I found this while writing benchmarks. reproduced and...
@Jeffwan @varungup90 I've updated this PR to include benchmark notebook as well - > [x] custom benchmark results in notebook similar to [this](https://github.com/vllm-project/aibrix/blob/main/benchmarks/plot/aibrix0.1-routing.ipynb) > * I have grouped users based...
update: rebased with main
> overall looks good to me. @Venkat2811 BTW, some of the features in this PR seem generic enough to be extracted as standalone components. Personally, I think it would be...
split into separate PRs as agreed: - https://github.com/vllm-project/aibrix/pull/1210 - https://github.com/vllm-project/aibrix/pull/1211
ready for review @Jeffwan
@Jeffwan Done 👍🏽
@Jeffwan is there anything else pending in this PR ? Looking to get this merged.