vllm [Bug]: deepseek_v2 236B on 8XA100 wrong output vllm==0.5.4

Your current environment

wrong output Prompt: 'Funniestjoke ever:',generated text: '!!!!!!!!!!!!!!!!!!' Prompt: The capital of France is:',generated text: '!!!!!!!!!!!!!!!!!!' Prompt: 'The future of AI is:',generated text: '!!!!!!!!!!!!!!!!!!'

🐛 Describe the bug

from llvm import LLM, SamplingPatams import argpase import torch

def generate(args, prompts): sampling_params = SamplingParams(temperature=0.8, top_k=1, max_tokens=20) llm = LLM(mode=args.model_path, trust_remote_code=True, max_model_len=2048,work_use_ray=True, enforce_eager=True, dtype=torch.half, tensor_parakkek_size=8, enable_chunked_prefill=False) outputs = llm.gennrate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.output[0].text print(f"prompt: {prompt!r} Generated text {generated_text!r}") if name == "main"： parse = argparse.ArgumentParser() parse.add_argument('--model_path',type =str) prompts = [ "Funniestjoke ever:", "The capital of France is", "The future of AI is", ] generate(args, prompts)

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Sep 09 '24 04:09 shuailong616

a8ddbe6c90a61c3c23d06b21a5036607

Sep 10 '24 01:09 shuailong616

@shuailong616 Hi! Can you try swapping the dtype=torch.half to dtype=torch.bfloat16?

Sep 11 '24 23:09 dsikka

@shuailong616 Hi! Can you try swapping the dtype=torch.half to dtype=torch.bfloat16?

thank you so much for your replay , use torch.bfloat16 have a currect result.

Sep 12 '24 01:09 shuailong616

Uploading iwEeAqNwbmcDAQTRBN0F0QDwBrCTfCCdbk_4pQbKkrYcZP4AB9ITCQwDCAAJomltCgAL0gACCvw.png_720x720q90.jpg…

Sep 12 '24 01:09 shuailong616

@shuailong616 Hi! Can you try swapping the dtype=torch.half to dtype=torch.bfloat16?

Hi! I also want to konw why torch.half result wrong reason . looking fowward to your reply

Sep 12 '24 02:09 shuailong616

@shuailong616 of course you need to match the dtype of the original model. models can be sensitive to dtypes.

Sep 20 '24 17:09 youkaichao