aibrix Do LLM Cache Support V100 hardware?

I using V100 gpu to testing deploy Distributed KV Cache exmaple, unfortunately it's failed, because requires flash attention backend.

Mar 04 '25 08:03 jlcoo

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

Mar 04 '25 18:03 DwyaneShi

@DwyaneShi Thanks for the update! I’m really looking forward to the support for more attention backends. I’m wondering if the distributed kv cache offloading feature with the support for more attention backends will be available in version 0.3?

Mar 05 '25 03:03 jlcoo

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

Where is the source code of LLM vineyard and vLLM branch, is that also opensource?

Mar 13 '25 02:03 huanggangfeng

@jlcoo We have release v0.3.0 recently, and it supports XFormers backend now. It would be great if you could have a try on the latest version. Please refer to the example in https://aibrix.readthedocs.io/latest/features/distributed-kvcache-and-cross-engine-kv-reuse.html for more details.

May 29 '25 03:05 DwyaneShi