ms-swift
ms-swift copied to clipboard
qwen2.5 vllm engine fail to load on multi-gpu cards
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I am using get_vllm_engine
to load qwen2.5 72b int4, and I have specified using multiple cuda cards by using os.environ. But i noticed that it is still loading on a single gpu card.
Here is my code.
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1,2,3,4,5,6,7,8'
os.environ['USE_HF'] = 'True'
from swift.llm import (
ModelType, get_vllm_engine, get_default_template_type,
get_template, inference_vllm, inference_stream_vllm
)
model_type = ModelType.qwen2_5_72b_instruct_gptq_int4
model_id_or_path = None
llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# 与`transformers.GenerationConfig`类似的接口
llm_engine.generation_config.max_new_tokens = 256
llm_engine.generation_config.do_sample = False
generation_info = {}
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) GPU: H100 CUDA version: 12.2 vllm: 0.6.1.post2 transformers: 4.44.2 torch:2.4.0
Additional context Add any other context about the problem here(在这里补充其他信息)