inference Qwen-Image-Edit-2509单卡部署和2卡部署速度差异非常大

System Info / 系統信息

cuda 12.4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[x] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

xinference latest L20 * 2

The command used to start Xinference / 用以启动 xinference 的命令

1.单卡部署 xinference launch --endpoint "http://0.0.0.0:6600" --model_path /modelscope/Qwen-Image-Edit-2509/ --model-type image --model-uid qwen -n Qwen-Image-Edit-2509 2.2卡部署 xinference launch --endpoint "http://0.0.0.0:6600" --model_path /modelscope/Qwen-Image-Edit-2509/ --model-type image --model-uid qwen -n Qwen-Image-Edit-2509 --n-gpu 2

Reproduction / 复现过程

1.单卡部署 xinference launch --endpoint "http://0.0.0.0:6600" --model_path /modelscope/Qwen-Image-Edit-2509/ --model-type image --model-uid qwen -n Qwen-Image-Edit-2509 2.2卡部署 xinference launch --endpoint "http://0.0.0.0:6600" --model_path /modelscope/Qwen-Image-Edit-2509/ --model-type image --model-uid qwen -n Qwen-Image-Edit-2509 --n-gpu 2 3.单卡部署同样的输入生成一张图要将近10分钟,多卡的话差不多两分钟

Expected behavior / 期待表现

这是什么原因

Nov 05 '25 01:11 kelliaao

多卡计算之间有很多的通信占了大头，除非一张卡放不下不要放到多卡。

Nov 07 '25 03:11 qinxuye

This issue is stale because it has been open for 7 days with no activity.

Nov 14 '25 19:11 github-actions[bot]