transformers icon indicating copy to clipboard operation
transformers copied to clipboard

from_pretrained torch_dtype DO NOT affect model buffers

Open Chandler-Bing opened this issue 1 year ago • 0 comments

System Info

pass

Who can help?

@ArthurZucker

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I recently encountered a problem with inconsistent model output. After debugging, I find why.

If I use torch_dtype=torch.bfloat16 in the from_pretrained method, it only affects the model parameters, but has no effect on the model buffer. The buffer is still float32. But in the configuration of trainer --bf16_full_eval=True, model.to(bfloat16) had been used, so that all parameters and buffers of the model become bfloat16. And that's why I see difference logits of same pretrained models

So the question is, is from_pretrained torch_dtype designed to only focus on the parameters and not the buffer? Or is this a bug?

Expected behavior

I'm not sure what behavior should be, but it will be nice to see some docs about it

Chandler-Bing avatar May 08 '24 11:05 Chandler-Bing