aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Do LLM Cache Support V100 hardware?

Open jlcoo opened this issue 9 months ago • 4 comments

I using V100 gpu to testing deploy Distributed KV Cache exmaple, unfortunately it's failed, because requires flash attention backend. Image

jlcoo avatar Mar 04 '25 08:03 jlcoo

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

DwyaneShi avatar Mar 04 '25 18:03 DwyaneShi

@DwyaneShi Thanks for the update! I’m really looking forward to the support for more attention backends. I’m wondering if the distributed kv cache offloading feature with the support for more attention backends will be available in version 0.3?

jlcoo avatar Mar 05 '25 03:03 jlcoo

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

Where is the source code of LLM vineyard and vLLM branch, is that also opensource?

huanggangfeng avatar Mar 13 '25 02:03 huanggangfeng

@jlcoo We have release v0.3.0 recently, and it supports XFormers backend now. It would be great if you could have a try on the latest version. Please refer to the example in https://aibrix.readthedocs.io/latest/features/distributed-kvcache-and-cross-engine-kv-reuse.html for more details.

DwyaneShi avatar May 29 '25 03:05 DwyaneShi