from_pretrained torch_dtype DO NOT affect model buffers

Open Chandler-Bing opened this issue 1 year ago • 0 comments

System Info

pass

Who can help?

@ArthurZucker

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I recently encountered a problem with inconsistent model output. After debugging, I find why.

If I use torch_dtype=torch.bfloat16 in the from_pretrained method, it only affects the model parameters, but has no effect on the model buffer. The buffer is still float32. But in the configuration of trainer --bf16_full_eval=True, model.to(bfloat16) had been used, so that all parameters and buffers of the model become bfloat16. And that's why I see difference logits of same pretrained models

So the question is, is from_pretrained torch_dtype designed to only focus on the parameters and not the buffer? Or is this a bug?

Expected behavior

I'm not sure what behavior should be, but it will be nice to see some docs about it

May 08 '24 11:05 Chandler-Bing