inference 多张gpu有空闲但是提示某一块gpu没有内存

System Info / 系統信息

NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[X] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

Release: v0.15.2

The command used to start Xinference / 用以启动 xinference 的命令

docker run
-v /root/.xinference:/root/.xinference
-v /root/.cache/huggingface:/root/.cache/huggingface
-v /root/.cache/modelscope:/root/.cache/modelscope
-p 9997:9997
--name xinference_gpu
-d
-e XINFERENCE_MODEL_SRC=modelscope
-e XINFERENCE_HOME=/root/.xinference
--gpus all
registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:latest
xinference-local -H 0.0.0.0 --log-level DEBUG

Reproduction / 复现过程

几个问题请教下 1\四张gpu卡,目前没有业务,为什么模型资源不释放. 2\ 以下报错提示OutOfMemoryError，此时其他gpu卡的资源为什么不会被使用？ 3、之前用ollama从没遇到提示GPU资源不足，是不是ollama会自动从不同的gpu获取资源所以从来没遇到过gpu资源不足的报错 4、xf是不是一个模型只能固定使用一张GPU上的资源，无法跨GPU卡？这是xf的限制还是理论上就是这样？ 5、xf经常报错，导致对于一些篇幅较长的文档进行分析时总是报错，这个怎么解决？

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.43 GiB. GPU 0 has a total capacity of 23.50 GiB of which 6.64 GiB is free. Process 9281 has 2.95 GiB memory in use. Process 20818 has 13.86 GiB memory in use. Of the allocated memory 12.95 GiB is allocated by PyTorch, and 643.88 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior / 期待表现

模型可以从不同的gpu获取资源，不要报错

Sep 22 '24 10:09 goactiongo

多卡加载的时候在界面上 n-gpu 里选择多张卡。

Sep 23 '24 08:09 qinxuye

其他问题麻烦解答下

Sep 23 '24 13:09 goactiongo

This issue is stale because it has been open for 7 days with no activity.

Oct 01 '24 19:10 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Oct 06 '24 19:10 github-actions[bot]

多卡加载的时候在界面上 n-gpu 里选择多张卡。

请问下你说的这个界面指的是那个，web页面的launch model上没有找到这个选择我现在是直接下载了sd3-medium这个模型使用，然后报这个错误，实际上卡的内容还是多的

Oct 09 '24 02:10 turndown

launch界面可以看到模型具体使用的哪张卡

安装模型时，n-gpu就是分配gou的数量

---原始邮件--- 发件人: @.> 发送时间: 2024年10月9日(周三) 上午10:49 收件人: @.>; 抄送: @.@.>; 主题: Re: [xorbitsai/inference] 多张gpu有空闲但是提示某一块gpu没有内存 (Issue #2345)

多卡加载的时候在界面上 n-gpu 里选择多张卡。

请问下你说的这个界面指的是那个，web页面的launch model上没有找到这个选择我现在是直接下载了sd3-medium这个模型使用，然后报这个错误，实际上卡的内容还是多的 image.png (view on web) image.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Oct 09 '24 04:10 goactiongo

launch界面可以看到模型具体使用的哪张卡安装模型时，n-gpu就是分配gou的数量 … ---原始邮件--- 发件人: @.> 发送时间: 2024年10月9日(周三) 上午10:49 收件人: @.>; 抄送: @.@.>; 主题: Re: [xorbitsai/inference] 多张gpu有空闲但是提示某一块gpu没有内存 (Issue #2345) 多卡加载的时候在界面上 n-gpu 里选择多张卡。请问下你说的这个界面指的是那个，web页面的launch model上没有找到这个选择我现在是直接下载了sd3-medium这个模型使用，然后报这个错误，实际上卡的内容还是多的 image.png (view on web) image.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

请问哪里可以选吗，我看到其他的大语言模型点进去好像可以设置n-gpu选项，但是图像模型没有这些：

Oct 09 '24 04:10 turndown

没注意你这个模型是否可以选择，默认都可以，可能模型有限制

Oct 09 '24 05:10 goactiongo

没注意你这个模型是否可以选择，默认都可以，可能模型有限制

是我的问题，我用pip默认安装的版本是0.13的，后面用源码安装0.15最新的有这些选项。感谢你的答复

Oct 09 '24 07:10 turndown