Yihua Cheng
Yihua Cheng
The correct URL should start with: https://github.com/vllm-project/vllm/tree/main/examples/others/lmcache Will create a quick PR to fix this
@csbo98 I haven't started on this, sorry. Feel free to do it if you want!
cc @KuntaiDu @YaoJiayi
> Why the first token of Generated text from decode instance is different from the first token generated from profill instance @maobaolong In the example, the prefill instance first generates...
> Thank for for this PR 😃. Here some small changes proopsal to restore the support of V0 (broken by this PR). @hasB4K Thanks for the catch! I'll update the...
> Should we just deprecate V0? Thanks for bringing this up @robertgshaw2-redhat ! I think we should still keep v0 before a performant v1 connector implementation is ready (we are...
@hasB4K @robertgshaw2-redhat Hey, I just pushed some new updates to address the review comments. Feel free to take a look and let me know if it does not resolve your...
@hasB4K @maobaolong About the memory safety / memory leaking issue: currently, the implementation about this is pretty hacky. I will spend some time to check whether there could any problems....
> @ApostaC Do you think it would be feasible to include an simple online server example in this PR to demonstrate how orchestrator would interact with KVConnector? @VertexC Yeah, it...
> But is there a demo to run? Can I run like this? @Huixxi There is an example in #16625