Can I finetune CPTForMaskedLM?
First, I would like to thank for the great work. Appreciated.
As stated in my question, I would like to try finetuning CPTForMakedLM and not sure if I could just say finetuning the decoder by training on the output logits? Sorry for this naive question as I'm new in this field. Thank you.
Sure! If you calculate loss on the G-Dec logits, you can fine-tune both the CPT Encoder and G-Decoder. In this case, the U-Decoder is not used. If you want to only tune the G-Dec and leave the Encoder unchanged, you can fix the parameters of Encoder by not updating them in the optimizer. And only update the parameters of G-Dec.
Thanks a lot for your reply. During the finetuning of CPTForMaskedLM, I need to add tokens to the tokenizer (BertTokenizer.from_pretrained("fnlp/cpt-large")) by calling tokenizer.add_tokens; afterwards, i go resize_token_embeddings. All good here, until I start calling model forward; it is found that the dimension of final_logits_bias not match. My sample codes and returns as below: (I omitted some unnecessary details here)
Codes:
from modeling_cpt import CPTForMaskedLM
model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
t = BertTokenizer.from_pretrained("fnlp/cpt-large")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
model(input_ids=...)
Returns
>>> from modeling_cpt import CPTForMaskedLM
>>> model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
rge")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
>>> t = BertTokenizer.from_pretrained("fnlp/cpt-large")
>>> t.add_tokens(["[SPL]"])
1
>>> model.resize_token_embeddings(len(t))
Embedding(51272, 1024)
>>> model(input_ids=...)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File ".../modeling_cpt.py", line 1497, in forward
dec_logits = self.lm_head(hidden_states) + self.final_logits_bias
RuntimeError: The size of tensor a (51272) must match the size of tensor b (51271) at non-singleton dimension 2
I brute force the fix by tallying the dimension of self.final_logits_bias as model.register_buffer("final_logits_bias", torch.zeros((1, model.model.shared.num_embeddings)).cuda()). I wonder if I can do that or there is a better way to do so. Any hints? Thanks a lot.
This fix is ok since the final_logits_bias is not trained and always be zeros. The functions to add new tokens in CPT are not implemented as they are not used in the pre-training and fine-tuning. It could be elegant to re-implement the resize_token_embedding for CPT, if you want.
Thanks. May I not close this issue for a while? As I'm pursuing the fine-tuning and may encounter issues very soon...