vllm
vllm copied to clipboard
[Bug]: when dtype='bfloat16', batch_size will cause different inference results
Your current environment
# reproduce codes:
from vllm import LLM, SamplingParams
import datasets
raw_datasets = datasets.load_dataset( "truthful_qa", 'generation')
questions = [i['question'] for i in raw_datasets['validation']]
llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.2" , dtype='bfloat16', trust_remote_code=True)
sampling_params = SamplingParams(
temperature=0,
max_tokens=256
)
for batch_size in [32, 64, 256]:
outputs = llm.generate(
questions[:batch_size],
sampling_params
)
for o in outputs[:5]:
print(o.outputs[0].text)
print()
print('------------------------------')
🐛 Describe the bug
when dtype='bfloat16', changing the batch_size
to different numbers, will cause the obvious differences in outputs
above.
This issue does not exist in float16 and 32.
vllm version: 0.4.1
I met the same case, when I use vllm to evaluate human-eval. Do you know the reason? Thanks