Do LLM Cache Support V100 hardware?
I using V100 gpu to testing deploy Distributed KV Cache exmaple, unfortunately it's failed, because requires flash attention backend.
@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.
@DwyaneShi Thanks for the update! I’m really looking forward to the support for more attention backends. I’m wondering if the distributed kv cache offloading feature with the support for more attention backends will be available in version 0.3?
@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.
Where is the source code of LLM vineyard and vLLM branch, is that also opensource?
@jlcoo We have release v0.3.0 recently, and it supports XFormers backend now. It would be great if you could have a try on the latest version. Please refer to the example in https://aibrix.readthedocs.io/latest/features/distributed-kvcache-and-cross-engine-kv-reuse.html for more details.