wentao comments

Results 51 comments of


                                            wentao

[Performance]: lmcache cannot work！

> > I want to confirm whether the command to enter the container is to run vllm serve directly? Because I can't see it here, I'm not sure whether I...

[Performance]: lmcache cannot work！

> I am using vllm version 0.8.5, and at this commit [f25e0d1](https://github.com/vllm-project/vllm/commit/f25e0d1125f873201ae880b50df46a9e3d29f3ba). This PR has been merged into the main branch. I see that the 0.8.5 branch has not merged...

[Performance]: lmcache cannot work！

> ``` > # pip list > Package Version Editable project location > ---------------------------------------- ----------------------------------- ------------------------- > aiofiles 24.1.0 > aiohappyeyeballs 2.6.1 > aiohttp 3.11.18 > aiosignal 1.3.2 > airportsdata...

[Performance]: lmcache cannot work！

> It is strange :( The lmcache.tar.gz looks good: > > ``` > shufan@gpusvr01:/nfs/home/shufan/tmp$ md5sum lmcache.tar.gz > cdbd0bd5ea1361c75de6218b30b4f077 lmcache.tar.gz > shufan@gpusvr01:/nfs/home/shufan/tmp$ sudo md5sum lmcache.tar > 5a10b32fd04af8f8281793c6ee1ba2bb lmcache.tar > ``` >...

[Performance]: lmcache cannot work！

> The same for me I have raised an issue in lmcache. I will sync it if there is any progress on the performance issue.

[Performance]: lmcache cannot work！

> The same for me I am currently looking at the work of PD separation. Can you add your WeChat? It will be convenient for later communication and learning～ 13474315223（wechat）

stepvideo 8409024GiB inference oom!

> 24G VRAM is not enough to store the whole model. Please set device="cpu" when you are using model_manager, and set device="cuda:xxx" when you are creating the pipeline. So that...

stepvideo 8409024GiB inference oom!

> [@tensorflowt](https://github.com/tensorflowt) We sincerely apologize, but we currently do not support multi-GPU parallel processing. This feature is still under development, so please stay tuned. I am currently using 8 cards...

[Bug]: 4090 run Qwen3-Omni-30B-A3B-Instruct failed！

> Hey [@tensorflowt](https://github.com/tensorflowt) We currently don't officially support quantization yet. Please feel free to make a new feature request for AWQ support. Thanks! refer to > > * [[Bug]: (I...