FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Qwen2 On NPU 910B Error

Open cason0126 opened this issue 1 year ago • 3 comments
trafficstars

When I use the Qwen2 series of models for inference in Ascend 910B 。 There are some things that are not normal

When I set the top_p = 1.0, it gets garbled, which is obvious. image

But when I set it to 0.9, it looks normal. image

At first, I thought it was some problem with the NPU, but when I used the official code like


from transformers import AutoModelForCausalLM, AutoTokenizer
device = "npu" # the device to load the model onto
max_memory = {0:"60GiB"}
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-14B-Chat",
    torch_dtype="auto",
    device_map="auto",
    max_memory = max_memory,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-14B-Chat")

prompt = "你好,你叫什么"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512,
    temperature=0.7, 
    top_p = 1.0,
    repetition_penalty=1.0
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


the result is right when i set top_p = 1.0 , the result is : image

both ways are run in same env . Fastchat = 0.2.36 Transformers = 4.37.0

So I've ruled out the issue of the environment for now.

why is this happening?

cason0126 avatar Feb 29 '24 12:02 cason0126

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

rickywu avatar May 20 '24 10:05 rickywu

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

看你用cli还是用worker ; 指定device = npu 即可

cason0126 avatar Jun 17 '24 09:06 cason0126

@cason0126 worker?我用worker 一直报下边这个错误 data: {"text": "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(ACL stream synchronize failed, error code:507018)", "error_code": 50001}

ganisback avatar Aug 11 '24 01:08 ganisback