DialoGPT
DialoGPT copied to clipboard
Why doesn't the model input include attention_mask?
https://github.com/microsoft/DialoGPT/blob/fa0c0c53a0e6d75b6541e50faa2d77ba480b27d9/LSP_train.py#L281
Since it is a LMHeadModel, the 1^th
-n^th
tokens are used to predict the (n+1)^th
token during training, so why not introduce attention_mask for masking the (n+2)^th
-(n+m)^th
tokens. Without attention_mask, there maybe an inconsistency between the training and the testing scene. Is it possible to add attention_mask during training to make the testing better?
Because GPT is a uni-directional language model. It does not need attention mask.
Why is the response concatenated to the input_ids for both the train and validation datasets? Would not this create over-fitted models? Would it be possible to somehow mask the response ids?