CRSLab
CRSLab copied to clipboard
Should we specify attention_mask when using gpt2 for the conversation task?
Here's an example when using 'redial dataset'
We do the padding function to make the dialogue in the same length, so that we can deal with batches.
- Dataloader process: code
In order to make the gpt2 pay no attention to the pad
, should we specify attention_mask when using gpt2 for conversation task?
Because we calculate the loss just with the response, should the padding labels set to -100
rather than 0
(code) so that the model can ignore it?
I think when doing the batch generation, we also need to pass the position ids
to the model. Otherwise, the position ids
will always be 1
in the loop except for the first round. You can find the discussion [here].(https://github.com/huggingface/transformers/issues/3021#issuecomment-591418233)
https://github.com/RUCAIBox/CRSLab/blob/b3ab262a4ad93cbae98fe66541eb735377768a35/crslab/model/conversation/gpt2/gpt2.py#L98
And why you cut off the context when doing the generation? https://github.com/RUCAIBox/CRSLab/blob/b3ab262a4ad93cbae98fe66541eb735377768a35/crslab/model/conversation/gpt2/gpt2.py#L95
Thanks for your feedback. @Oran-Ac
- The padding labels are set to -100 according to your suggestion.
- Due to the limitation of gpu memory, we cuf off the context when doing the generation.
- The position_ids will be automatically added by Transformers if no parameter is passed.