[Bug]: deepseek_v2 236B on 8XA100 wrong output vllm==0.5.4
Your current environment
wrong output Prompt: 'Funniestjoke ever:',generated text: '!!!!!!!!!!!!!!!!!!' Prompt: The capital of France is:',generated text: '!!!!!!!!!!!!!!!!!!' Prompt: 'The future of AI is:',generated text: '!!!!!!!!!!!!!!!!!!'
🐛 Describe the bug
from llvm import LLM, SamplingPatams import argpase import torch
def generate(args, prompts): sampling_params = SamplingParams(temperature=0.8, top_k=1, max_tokens=20) llm = LLM(mode=args.model_path, trust_remote_code=True, max_model_len=2048,work_use_ray=True, enforce_eager=True, dtype=torch.half, tensor_parakkek_size=8, enable_chunked_prefill=False) outputs = llm.gennrate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.output[0].text print(f"prompt: {prompt!r} Generated text {generated_text!r}") if name == "main": parse = argparse.ArgumentParser() parse.add_argument('--model_path',type =str) prompts = [ "Funniestjoke ever:", "The capital of France is", "The future of AI is", ] generate(args, prompts)
Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@shuailong616 Hi! Can you try swapping the dtype=torch.half to dtype=torch.bfloat16?
@shuailong616 Hi! Can you try swapping the
dtype=torch.halftodtype=torch.bfloat16?
thank you so much for your replay , use torch.bfloat16 have a currect result.
@shuailong616 Hi! Can you try swapping the
dtype=torch.halftodtype=torch.bfloat16?
Hi! I also want to konw why torch.half result wrong reason . looking fowward to your reply
@shuailong616 of course you need to match the dtype of the original model. models can be sensitive to dtypes.