FunASR Issue with Multi-GPU Inference in Xinference Using vLLM for Model Loading

Issue with Multi-GPU Inference in Xinference Using vLLM for Model Loading

Open Bc-Aqr opened this issue 8 months ago • 0 comments

I am currently facing an issue with using multiple GPUs simultaneously when running inference on vLLM with Xinference. The setup works correctly when using a single GPU with smaller models, but it fails when trying to run multi-GPU inference for larger models. Below is the detailed description of the problem and my environment setup.

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+

xinference.log

Mar 05 '25 05:03 Bc-Aqr

FunASR FunASR copied to clipboard

Issue with Multi-GPU Inference in Xinference Using vLLM for Model Loading

FunASR
FunASR copied to clipboard