Coding-Zuo
Coding-Zuo
realtime server 方面的问题 CUDA_VISIBLE_DEVICES不生效对于VLLM导致两个模型都在一个GPU上加载导致OOM os.environ["CUDA_VISIBLE_DEVICES"] = cuda_devices llm = LLM( model=engine_args, dtype="float16", tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True, limit_mm_per_prompt={'image':256,'audio':50} ) 如果传入device还会报其他错误 llm = LLM( model=engine_args, dtype="float16", **device= cuda_devices,** tensor_parallel_size=1, trust_remote_code=True, gpu_memory_utilization=0.85, disable_custom_all_reduce=True,...