VITA realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载

Open Coding-Zuo opened this issue 10 months ago • 3 comments

realtime server 方面的问题 CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM

os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices llm = LLM( model=engine_args, dtype="float16", tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True, limit_mm_per_prompt={'image':256,'audio':50} )

如果传入device还会报其他错误 llm = LLM( model=engine_args, dtype="float16", device= cuda_devices, tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True, limit_mm_per_prompt={'image':256,'audio':50} ) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

 而且貌似两个80gA100显存不太够

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误

该如何指定GPU？ 2.如何并行部署？

Jan 09 '25 03:01 Coding-Zuo

已提交最新代码以解决在两张卡上部署的问题，关键修改是在子进程启动后再加载 torch 相关包

Jan 12 '25 02:01 lxysl

已提交最新代码以解决在两张卡上部署的问题，关键修改是在子进程启动后再加载 torch 相关包

请问是如何使用 @lxysl

Mar 18 '25 08:03 Glorainow

修改 cuda_devices https://github.com/VITA-MLLM/VITA/blob/6a26b5cbe1472e9854072d4add674108ae5c6504/web_demo/server.py#L992 https://github.com/VITA-MLLM/VITA/blob/6a26b5cbe1472e9854072d4add674108ae5c6504/web_demo/server.py#L1013

Mar 18 '25 09:03 lxysl

VITA VITA copied to clipboard

realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载

VITA
VITA copied to clipboard