DialoGPT Why doesn't the model input include attention

Why doesn't the model input include attention_mask?

Open li3cmz opened this issue 4 years ago • 2 comments

https://github.com/microsoft/DialoGPT/blob/fa0c0c53a0e6d75b6541e50faa2d77ba480b27d9/LSP_train.py#L281

Since it is a LMHeadModel, the 1^th-n^th tokens are used to predict the (n+1)^th token during training, so why not introduce attention_mask for masking the (n+2)^th-(n+m)^th tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?

Dec 25 '20 02:12 li3cmz

Because GPT is a uni-directional language model. It does not need attention mask.

Mar 09 '21 06:03 chujiezheng

DialoGPT/data_loader.py

Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids?

Apr 26 '21 08:04 lmrojasb

DialoGPT DialoGPT copied to clipboard

Why doesn't the model input include attention_mask?

DialoGPT
DialoGPT copied to clipboard