from_pretrained torch_dtype DO NOT affect model buffers
System Info
pass
Who can help?
@ArthurZucker
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I recently encountered a problem with inconsistent model output. After debugging, I find why.
If I use torch_dtype=torch.bfloat16 in the from_pretrained method, it only affects the model parameters, but has no effect on the model buffer. The buffer is still float32.
But in the configuration of trainer --bf16_full_eval=True, model.to(bfloat16) had been used, so that all parameters and buffers of the model become bfloat16.
And that's why I see difference logits of same pretrained models
So the question is, is from_pretrained torch_dtype designed to only focus on the parameters and not the buffer? Or is this a bug?
Expected behavior
I'm not sure what behavior should be, but it will be nice to see some docs about it