[Feat] [WIP] Ollama integration

Open ztang2370 opened this issue 3 months ago • 1 comments

Issue https://github.com/ovg-project/kvcached/issues/81

Adds initial support for integrating Ollama with kvcached.
Verified workflow locally on a single CUDA GPU (RTX 3090).
Current implementation runs end-to-end but requires:
- Additional testing (multi-GPU, different environments)
- Review of integration approach (may not be best practice)
- Potential optimizations for performance and maintainability

Marked as WIP. Feedback welcome on design and direction.

9.16 update: https://docs.google.com/document/d/1mDTKBoCZslLcSu2OsgCNVzl-J6HeY-Vl7s19V938PHY/edit?tab=t.0

9.17 update: Test branch: https://github.com/ztang2370/kvcached/tree/ztang/test-ollama-integration https://github.com/ztang2370/ollama/tree/my-v0.11.8

git clone [email protected]:ztang2370/kvcached.git
git switch ztang/test-ollama-integration
git submodule update --init
cd engine_integration/ollama-v0.11.8 && git switch my-v0.11.8
set up, build and run ollama

9.21 update: webui: https://drive.google.com/file/d/1ZUGWDK3JleCciizZyTybe33inmGvAmVS/view?usp=sharing

TODO:

[ ] Complete bug-free end-to-end workflow
[ ] Running example
[ ] Benchmark performance

Sep 14 '25 12:09 ztang2370

@ztang2370 Thanks for the great work!

The direction this PR heading to looks goo to me. To show the benefits of kvcached, I think in the test, we need to run at least two models using ollama concurrently.

The README has some repeating words generated by AI. Also the setup script, please clean the symbols added by AI.

We also need a cool example to show off this. For example, in webui https://github.com/open-webui/open-webui, we can have two models running together in the model list. Just some quick thoughts---you could think about the most reasonable and easist way to show this.

Sep 14 '25 16:09 jiarong0907