libin817927
libin817927
Crucial feature, eagerly awaited.
Hi, I'm trying to deploy in the company's k8s cluster. I expect to deploy a small cluster of the Qwen3 32B model on 4 Nvidia L4 GPUs. I'll describe my...
> [@libin817927](https://github.com/libin817927) thanks for your details questions. This is fair and reasonable asks. I happen to do some testing against v0.3.0-rc release. I can list missing piece, especially the kv...
> [@libin817927](https://github.com/libin817927) what's your environments? Are you running on volcano engine? if so, please share the node image details. If not, please help me know how to allocate RDMA resources...
> > > [@libin817927](https://github.com/libin817927) what's your environments? Are you running on volcano engine? if so, please share the node image details. If not, please help me know how to allocate...
> [@TianTengya](https://github.com/TianTengya) Yes. P&D is not the focus, we are busy with kv cache solutions and plan to fully unblock prefix-cache scenarios first. the next step would be xPyD. I...
> [@libin817927](https://github.com/libin817927) Thanks for trying out the kv cache offloading feature. If you'd like to use AIBRIX_KV_CACHE_OL_L1_CACHE_CAPACITY_GB=80 (i.e., using extra 80GB DRAM for each engine process for kv cache offloading...