MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊?

Open triumph opened this issue 1 year ago • 1 comments

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊? python web_demo.py --device cuda --dtype bf16

Vllm运行命令如下 /services/srv/MiniCPM-vllm/venv/bin/python -m vllm.entrypoints.openai.api_server --model /services/srv/MiniCPM-V/openbmb/MiniCPM-V-2/ --trust-remote-code

5fc1c797807f86d12cc37208620d731

triumph avatar Apr 25 '24 04:04 triumph

你好,vLLM有自己的显存管理机制,api模式初始化LLM时默认gpu_memory_utilization=0.9,所以用了22.5G,代码见:https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L93

iceflame89 avatar May 09 '24 03:05 iceflame89