ms-swift qwen2.5 vllm engine fail to load on multi-gpu cards

qwen2.5 vllm engine fail to load on multi-gpu cards

Open ff1Zzd opened this issue 5 months ago • 1 comments

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) I am using get_vllm_engine to load qwen2.5 72b int4, and I have specified using multiple cuda cards by using os.environ. But i noticed that it is still loading on a single gpu card.

Here is my code.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1,2,3,4,5,6,7,8'
os.environ['USE_HF'] = 'True'

from swift.llm import (
    ModelType, get_vllm_engine, get_default_template_type,
    get_template, inference_vllm, inference_stream_vllm
)

model_type = ModelType.qwen2_5_72b_instruct_gptq_int4
model_id_or_path = None
llm_engine = get_vllm_engine(model_type, model_id_or_path=model_id_or_path)
template_type = get_default_template_type(model_type)
template = get_template(template_type, llm_engine.hf_tokenizer)
# 与`transformers.GenerationConfig`类似的接口
llm_engine.generation_config.max_new_tokens = 256
llm_engine.generation_config.do_sample = False
generation_info = {}

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) GPU: H100 CUDA version: 12.2 vllm: 0.6.1.post2 transformers: 4.44.2 torch:2.4.0

Additional context Add any other context about the problem here(在这里补充其他信息)

Sep 26 '24 04:09 ff1Zzd

ms-swift ms-swift copied to clipboard

qwen2.5 vllm engine fail to load on multi-gpu cards

ms-swift
ms-swift copied to clipboard