inference web ui 部署小模型的时候一个slot 只能部署一个模型？无法部署多个模型，即使gpu空间绰绰有余

System Info / 系統信息

Server error: 503 - [address=0.0.0.0:35434, pid=25922] No available slot found for the model

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[ ] docker / docker
[X] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinference 0.13.3

The command used to start Xinference / 用以启动 xinference 的命令

cd /data/galileo/models/xinference . xinference/bin/activate xinference-local --host 0.0.0.0 --port 8888

Reproduction / 复现过程

web ui 部署模型的时候一个slot 只能部署一个模型？无法部署多个模型，即使gpu空间绰绰有余？我首先部署了一个qwen0.5b 这是我的GPU占用比5999MiB / 49152MiB，当我在部署一个qwen0.5b的时候就会报错

Expected behavior / 期待表现

合理的利用插槽

Sep 06 '24 02:09 Songjiadong

same here 等版本更新ing 我目前是用起多个 xinference 实例的方式跑多个模型

Sep 06 '24 08:09 Valdanitooooo

在WebUI的模型启动参数界面，强制指定gpu_index，可以单卡跑多个模型

Sep 06 '24 12:09 wenzhaoabc

@wenzhaoabc 你试试 vllm 貌似不行

Sep 13 '24 03:09 Songjiadong

@wenzhaoabc 你试试 vllm 貌似不行试了，vllm要独占一块卡，改成Transformers 能在一块4090运行下面俩模型 --model-engine Transformers --gpu-idx 1 -n qwen2-instruct -f pytorch --gpu_memory_utilization 0.7 --model-engine Transformers --gpu-idx 1 -n qwen2-instruct -f pytorch

Sep 14 '24 08:09 zhangxianglink

vllm默认会将载入模型后剩余的显存全部用来做kv cache，vllm也可以通过参数--gpu-memory-utilization控制显存使用率，默认是0.9

https://github.com/vllm-project/vllm/issues/2430 https://docs.vllm.ai/en/latest/models/engine_args.html

Sep 15 '24 04:09 wenzhaoabc

@wenzhaoabc 你加--gpu-memory-utilization 改成0.2 也没用

Sep 20 '24 09:09 Songjiadong

@zhangxianglink 是的transformer可以多个实例

Sep 20 '24 09:09 Songjiadong

same here 等版本更新ing 我目前是用起多个 xinference 实例的方式跑多个模型

我现在也是，vllm控制了显存使用率后还是无法加载别的模型

Sep 23 '24 08:09 guoping1127

This issue is stale because it has been open for 7 days with no activity.

Sep 30 '24 19:09 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Oct 05 '24 19:10 github-actions[bot]

这个问题什么时候能够修复？

Oct 08 '24 01:10 Songjiadong

same issue, the vllm cannot run multi models on a single gpu even with the setting of gpu-memory-utilization

Feb 13 '25 06:02 zhuyl96

@zhuyl96 这个你是如何应对解决的？

Feb 14 '25 01:02 Songjiadong