CPT Can I finetune CPTForMaskedLM?

First, I would like to thank for the great work. Appreciated.

As stated in my question, I would like to try finetuning CPTForMakedLM and not sure if I could just say finetuning the decoder by training on the output logits? Sorry for this naive question as I'm new in this field. Thank you.

Apr 07 '23 05:04 bmkor

Sure! If you calculate loss on the G-Dec logits, you can fine-tune both the CPT Encoder and G-Decoder. In this case, the U-Decoder is not used. If you want to only tune the G-Dec and leave the Encoder unchanged, you can fix the parameters of Encoder by not updating them in the optimizer. And only update the parameters of G-Dec.

Apr 09 '23 08:04 choosewhatulike

Thanks a lot for your reply. During the finetuning of CPTForMaskedLM, I need to add tokens to the tokenizer (BertTokenizer.from_pretrained("fnlp/cpt-large")) by calling tokenizer.add_tokens; afterwards, i go resize_token_embeddings. All good here, until I start calling model forward; it is found that the dimension of final_logits_bias not match. My sample codes and returns as below: (I omitted some unnecessary details here)

Codes:

from modeling_cpt import CPTForMaskedLM
model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
t = BertTokenizer.from_pretrained("fnlp/cpt-large")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
model(input_ids=...)

Returns

>>> from modeling_cpt import CPTForMaskedLM
>>> model = CPTForMaskedLM.from_pretrained("fnlp/cpt-large").cuda()
rge")
t.add_tokens(["[SPL]"])
model.resize_token_embeddings(len(t))
>>> t = BertTokenizer.from_pretrained("fnlp/cpt-large")
>>> t.add_tokens(["[SPL]"])
1
>>> model.resize_token_embeddings(len(t))
Embedding(51272, 1024)
>>> model(input_ids=...)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../modeling_cpt.py", line 1497, in forward
    dec_logits = self.lm_head(hidden_states) + self.final_logits_bias
RuntimeError: The size of tensor a (51272) must match the size of tensor b (51271) at non-singleton dimension 2

I brute force the fix by tallying the dimension of self.final_logits_bias as model.register_buffer("final_logits_bias", torch.zeros((1, model.model.shared.num_embeddings)).cuda()). I wonder if I can do that or there is a better way to do so. Any hints? Thanks a lot.

Apr 09 '23 09:04 bmkor

This fix is ok since the final_logits_bias is not trained and always be zeros. The functions to add new tokens in CPT are not implemented as they are not used in the pre-training and fine-tuning. It could be elegant to re-implement the resize_token_embedding for CPT, if you want.

Apr 09 '23 11:04 choosewhatulike

Thanks. May I not close this issue for a while? As I'm pursuing the fine-tuning and may encounter issues very soon...

Apr 10 '23 01:04 bmkor