FastChat
FastChat copied to clipboard
Qwen2 On NPU 910B Error
When I use the Qwen2 series of models for inference in Ascend 910B 。 There are some things that are not normal
When I set the top_p = 1.0, it gets garbled, which is obvious.
But when I set it to 0.9, it looks normal.
At first, I thought it was some problem with the NPU, but when I used the official code like
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "npu" # the device to load the model onto
max_memory = {0:"60GiB"}
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen1.5-14B-Chat",
torch_dtype="auto",
device_map="auto",
max_memory = max_memory,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-14B-Chat")
prompt = "你好,你叫什么"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512,
temperature=0.7,
top_p = 1.0,
repetition_penalty=1.0
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
the result is right when i set top_p = 1.0 , the result is :
both ways are run in same env . Fastchat = 0.2.36 Transformers = 4.37.0
So I've ruled out the issue of the environment for now.
why is this happening?
I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?
I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?
看你用cli还是用worker ; 指定device = npu 即可
@cason0126 worker?我用worker 一直报下边这个错误 data: {"text": "NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(ACL stream synchronize failed, error code:507018)", "error_code": 50001}