Quasi-Attention-ABSA icon indicating copy to clipboard operation
Quasi-Attention-ABSA copied to clipboard

bad results when use models downloaded from huggingface

Open beanandrew opened this issue 3 years ago • 1 comments

Hi, I try to reproduct your work with pytorch BERT model download from huggingface, only to get a very bad result, the training loss keeps around 1.0 in the first 10 epochs. But when I follow your instruction, download google BERT model and converte it with the helper script, then the training process seems to go well. I wonder why this is happening? Is this because these two models are very different? Huggingface download link here: https://huggingface.co/bert-base-uncased/tree/main

beanandrew avatar Nov 04 '21 08:11 beanandrew

Hi, Thanks for your comments. I believe the reason is about the variable namings.

If you look at this line of code in the training set-up https://github.com/frankaging/Quasi-Attention-ABSA/blob/main/code/util/train_helper.py#L300, model.bert.load_state_dict(torch.load(init_checkpoint, map_location='cpu'), strict=False) this load_state_dict will load parameters based on names. I think, with the current code, the HuggingFace model has different names for all variables in BERT. As a result, you are not loading any weights from the pre-trained BERT. You can check this by simply printing out weights before this line and after this line. And you will see what I am talking about, I think.

There are two solutions: (1) using the google one for pre-trained weights importing like what you are doing. (2) change the code to integrate with both models. The second approach will require you to modify the variable namings of the model.

Does this make sense?

frankaging avatar Nov 04 '21 18:11 frankaging