Jiarong Xing issues

Repositories
Issues
Comments

Results 4 issues of


                                            Jiarong Xing

[TODO] Ollama integeration

It would be great to support Ollama with kvcached for local deployment of multiple LLMs.

[TODO] AMD and other GPU/device support

[TODO] Support kvcached offloading to other storage like CPU memory

When the GPU memory is almost full, kvcached can support offloading KV cache to CPU memory or even disks. Do this using CUDA UVM or more application semantics?

[TODO] Add an example of vLLM semantics router?

Can we add an exmple to demonstrate kvcached with vLLM semantics router? https://vllm-semantic-router.com/ We can run multiple models on one GPU for the router to choose, including the sleep and...