DeBERTa icon indicating copy to clipboard operation
DeBERTa copied to clipboard

Model is not initialized correctly when path to a pretrained model is provided via `pre_trained`

Open ThuongTNguyen opened this issue 1 year ago • 0 comments

Description

I use a script similar to cola.sh to train and/or evaluate a model for sequence classification. There are two possible parameters for model state files init_model and pre_trained. I want and expect the model to be loaded with weights from pre_trained when provided while vocabulary is loaded based on init_model if init_model is one of the provided pretrained models. However, the model parameters are actually loaded using init_model only. That's because pre_trained flag doesn't have an effect in this fucntion, although I expect pre_trained should override init_model.

Steps to reproduce

  • Set init_model to deberta-v3-base
  • Set pre_trained to $PATH_TO_MY_MODEL, which is a path to the pretrained mDeBERTa-V3-Base for example
  • Check the model parameter after loading, e.g print(model.deberta.encoder.layer[7].output.dense.weight[:5,:4]) after this line
    • Expected result (mDeBERTa-v3-base): tensor([[-0.0212, 0.0130, 0.0446, 0.0156], [ 0.0811, 0.0023, 0.0057, -0.0301], [-0.0190, 0.0097, -0.0114, 0.0306], [ 0.0049, -0.0174, 0.0064, -0.0275], [-0.0152, -0.0411, -0.0166, -0.0447]], dtype=torch.float16)
    • Actual result (DeBERTa-v3-base): tensor([[ 0.0278, -0.0206, -0.0062, 0.0368], [ 0.0262, -0.0676, 0.0477, 0.0249], [-0.0364, 0.0453, 0.0912, 0.0590], [-0.0638, 0.0402, 0.0272, -0.0013], [-0.0352, -0.0579, 0.0320, 0.0003]], grad_fn=<SliceBackward0>)

Additional information/Environment

My system setup is:

  • PyTorch 1.10.0+cu113
  • GPU: NVIDIA GeForce GTX 1080 Ti

ThuongTNguyen avatar Dec 09 '23 00:12 ThuongTNguyen