wenzhaoabc

Results 5 comments of wenzhaoabc

> > Bot detected the issue body's language is not English, translate it automatically. > > I asked weakly what the Artifacts in the plug-in are for. I searched Google...

在WebUI的模型启动参数界面,强制指定gpu_index,可以单卡跑多个模型 ![image](https://github.com/user-attachments/assets/c61f2133-3d06-423f-bd95-9bc7c2a81619)

vllm默认会将载入模型后剩余的显存全部用来做kv cache,vllm也可以通过参数`--gpu-memory-utilization`控制显存使用率,默认是0.9 > https://github.com/vllm-project/vllm/issues/2430 > https://docs.vllm.ai/en/latest/models/engine_args.html