[Question] How to access the vLLM-Vineyard integration code mentioned in Distributed KV Cache documentation?
The Distributed KV Cache documentation references a customized vLLM implementation with Vineyard integration.
However, I couldn't locate the corresponding code implementation.
Could you help me clarify:
- Is the customized vLLM code with Vineyard support publicly accessible?
- If available, is it in a separate branch/repository?
Hi Yang,
https://github.com/vllm-project/aibrix/blob/6feec99d77c84e371da9c535054c2b8aa8912704/pkg/controller/kvcache/kvcache_controller.go#L505
I think it uses a customized vllm, something like https://github.com/vllm-project/vllm/compare/main...aibrix:vllm:feat/distributed-kv-cache
From what I understand, this only works with v0.6.x of vLLM.
Great to see you here @cheyang long time no see. Yeah, @gaocegege gave the code pointer in vLLM.
- vLLM code will be refactor to adapt to v1 architecture and a RFC will be cut soon. this part will be definitely upstreamed.
- @DwyaneShi made some changes like metadata optimization and advanced eviction policies in vineyard. @DwyaneShi feel free to comment if you have more information like to share.
Hi Yang,
aibrix/pkg/controller/kvcache/kvcache_controller.go
Line 505 in 6feec99
fmt.Sprintf(
/usr/local/bin/vineyardd --sync_crds true --socket /var/run/vineyard.sock --size --stream_threshold 80 --etcd_cmd etcd --etcd_prefix /vineyard --etcd_endpoint http://%s-etcd-service:2379, kvCache.Name), I think it uses a customized vllm, something like vllm-project/[email protected]:vllm:feat/distributed-kv-cacheFrom what I understand, this only works with v0.6.x of vLLM.
Thanks for pointing this out!
I think it uses a customized vllm, something like vllm-project/[email protected]:vllm:feat/distributed-kv-cache
Is this the actual code for the distributed KV cache used in AIBrix? I would like to make changes to it. Where can I find the actual implementation? Thanks.
@nechamab1 we release latest version of kv cache support. Please check documentation for more details.
Now, all the code are hosted in this repo https://github.com/vllm-project/aibrix/blob/main/python/aibrix_kvcache/integration/vllm/vllm_v0.8.5-aibrix-kvcache.patch
this version improves the performance a lot and I suggest you to take a look at it