inference Qwen2.5 7b显存占用过大

为什么使用xinference启动qwen2.5 7b instruct模型（直接在页面使用qwen2.5 instruct的选项启动），显存直接占了40多个g，这是为啥，显卡是a6000 48g，是上下文长度是32k导致的吗，引擎选择的是vllm

Sep 25 '24 09:09 mengxianglong123

gpu_memory_utilization默认是 0.9，可以自己调整参数

比如 max_model_len: 32768 gpu_memory_utilization: 0.8

Sep 26 '24 02:09 Valdanitooooo

gpu_memory_utilization默认是 0.9，可以自己调整参数

比如 max_model_len: 32768 gpu_memory_utilization: 0.8

您好，xinference在launch模型的时候，可以指定vllm的这个参数么，在文档中没有找到

Sep 26 '24 03:09 mengxianglong123

您好，xinference在launch模型的时候，可以指定vllm的这个参数么，在文档中没有找到

可以的，文档可能不完善，--gpu_memory_utilization 0.8 这样

Sep 26 '24 03:09 Valdanitooooo

您好，xinference在launch模型的时候，可以指定vllm的这个参数么，在文档中没有找到

可以的，文档可能不完善，--gpu_memory_utilization 0.8 这样

gpu_memory_utilization这个参数的具体解释是什么？

Oct 07 '24 10:10 bigbrother666sh

gpu_memory_utilization这个参数的具体解释是什么？

来自 vllm 的参数 https://github.com/vllm-project/vllm/blob/8eeb85708428b7735bbd1156c81692431fd5ff34/vllm/entrypoints/llm.py#L105

Oct 08 '24 01:10 Valdanitooooo

thx

Oct 09 '24 02:10 bigbrother666sh

This issue is stale because it has been open for 7 days with no activity.

Oct 16 '24 19:10 github-actions[bot]

请问所有参数都在这个文档中吗，没看到max_model_len

---原始邮件--- 发件人: @.> 发送时间: 2024年10月9日(周三) 上午10:18 收件人: @.>; 抄送: @.@.>; 主题: Re: [xorbitsai/inference] Qwen2.5 7b显存占用过大 (Issue #2368)

thx

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Oct 17 '24 00:10 goactiongo

请问您是怎么成功部署QWEN2.5模型的？我通过UI界面启动，如何使用都会出现CUDA OUT OF MEMORY，即便是QWEN2.5-0.5B-INSTRUCT我在一台3090电脑上使用。我通过shell启动，会直接出线requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Oct 18 '24 11:10 ipc-robot

是不是显存里面本来就有其他东西，nvtop 或者 nvitop 查查

Oct 20 '24 14:10 bigbrother666sh

This issue is stale because it has been open for 7 days with no activity.

Oct 27 '24 19:10 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Nov 02 '24 19:11 github-actions[bot]

inference inference copied to clipboard

Qwen2.5 7b显存占用过大

inference
inference copied to clipboard