DeBERTa
DeBERTa copied to clipboard
Model is not initialized correctly when path to a pretrained model is provided via `pre_trained`
Description
I use a script similar to cola.sh
to train and/or evaluate a model for sequence classification.
There are two possible parameters for model state files init_model
and pre_trained
.
I want and expect the model to be loaded with weights from pre_trained
when provided while vocabulary is loaded based on init_model
if init_model
is one of the provided pretrained models.
However, the model parameters are actually loaded using init_model
only. That's because pre_trained
flag doesn't have an effect in this fucntion, although I expect pre_trained
should override init_model
.
Steps to reproduce
- Set
init_model
todeberta-v3-base
- Set
pre_trained
to $PATH_TO_MY_MODEL, which is a path to the pretrained mDeBERTa-V3-Base for example - Check the model parameter after loading, e.g
print(model.deberta.encoder.layer[7].output.dense.weight[:5,:4])
after this line- Expected result (mDeBERTa-v3-base): tensor([[-0.0212, 0.0130, 0.0446, 0.0156], [ 0.0811, 0.0023, 0.0057, -0.0301], [-0.0190, 0.0097, -0.0114, 0.0306], [ 0.0049, -0.0174, 0.0064, -0.0275], [-0.0152, -0.0411, -0.0166, -0.0447]], dtype=torch.float16)
- Actual result (DeBERTa-v3-base): tensor([[ 0.0278, -0.0206, -0.0062, 0.0368], [ 0.0262, -0.0676, 0.0477, 0.0249], [-0.0364, 0.0453, 0.0912, 0.0590], [-0.0638, 0.0402, 0.0272, -0.0013], [-0.0352, -0.0579, 0.0320, 0.0003]], grad_fn=<SliceBackward0>)
Additional information/Environment
My system setup is:
- PyTorch 1.10.0+cu113
- GPU: NVIDIA GeForce GTX 1080 Ti