VITA
VITA copied to clipboard
realtime server CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载
realtime server 方面的问题 CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM
os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices llm = LLM( model=engine_args, dtype="float16", tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True, limit_mm_per_prompt={'image':256,'audio':50} )
如果传入device还会报其他错误 llm = LLM( model=engine_args, dtype="float16", device= cuda_devices, tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True, limit_mm_per_prompt={'image':256,'audio':50} ) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
而且貌似两个80gA100显存不太够
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 79.33 GiB of which 802.31 MiB is free. Process 4042162 has 49.78 GiB memory in use. Process 4042163 has 2.97 GiB memory in use. Process 4045381 has 22.37 GiB memory in use. Process 4045380 has 414.00 MiB memory in use. Process 4045382 has 2.97 GiB memory in use. Of the allocated memory 20.31 GiB is allocated by PyTorch, and 1.56 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
如果模型并行tensor_parallel_size=2, 还有cuda初始化方面的错误
- 该如何指定GPU? 2.如何并行部署?
已提交最新代码以解决在两张卡上部署的问题,关键修改是在子进程启动后再加载 torch 相关包
已提交最新代码以解决在两张卡上部署的问题,关键修改是在子进程启动后再加载 torch 相关包
请问是如何使用 @lxysl
修改 cuda_devices https://github.com/VITA-MLLM/VITA/blob/6a26b5cbe1472e9854072d4add674108ae5c6504/web_demo/server.py#L992 https://github.com/VITA-MLLM/VITA/blob/6a26b5cbe1472e9854072d4add674108ae5c6504/web_demo/server.py#L1013