ztang2370
ztang2370
Hi @jiarong0907, I would like to take on the Ollama integration issue, and I think it’ll be a great way for me to get more familiar with the codebase.
> Great! I think the sleep and wakeup functionality has already been merged, right? @jiarong0907 I think the traffic monitoring and sleep management are tied with the router. Since we...
> @ztang2370 @cui36 Having vllm semantic router is great, but I would suggest we add it as a feature later. > > For the example, we can just use the...
@njhill @simon-mo I'd appreciate a review when you have time
> The numbers looks great but it would be good to also run some standard benchmarks to assess overall performance impact. For example, there will be some TTFT impact from...
> Thanks @ztang2370 I think this looks pretty good for a short term solution, apart from the comments inline. > > I'm still thinking though about how we might reconcile...
> Thanks @ztang2370 and sorry for the delay! > > Most remaining comments are related to avoiding doing unnecessary work/loops. Thanks @njhill for your review! I’ve addressed the remaining comments....
> Thanks @ztang2370! I think some further simplifications can be made, see inline comments @njhill Thank you for the comments of improvement! Updated.