vllm [Bug]: when dtype='bfloat16', batch_size will cause different inference results

[Bug]: when dtype='bfloat16', batch_size will cause different inference results

Open yananchen1989 opened this issue 9 months ago • 1 comments

Your current environment

# reproduce codes:

from vllm import LLM, SamplingParams
import datasets
raw_datasets = datasets.load_dataset( "truthful_qa", 'generation')

questions = [i['question'] for i in raw_datasets['validation']]

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.2" ,  dtype='bfloat16', trust_remote_code=True)


sampling_params = SamplingParams(
    temperature=0,
    max_tokens=256
)

for batch_size in [32, 64, 256]:

    outputs = llm.generate(
            questions[:batch_size],
            sampling_params
        )

    for o in outputs[:5]:
        print(o.outputs[0].text)
        print()

    print('------------------------------')

🐛 Describe the bug

when dtype='bfloat16', changing the batch_size to different numbers, will cause the obvious differences in outputs above. This issue does not exist in float16 and 32.

May 05 '24 14:05 yananchen1989

vllm version: 0.4.1

May 05 '24 14:05 yananchen1989

I met the same case, when I use vllm to evaluate human-eval. Do you know the reason? Thanks

May 11 '24 06:05 sitabulaixizawaluduo

vllm vllm copied to clipboard

[Bug]: when dtype='bfloat16', batch_size will cause different inference results

Your current environment

🐛 Describe the bug

vllm
vllm copied to clipboard