CRSLab Should we specify attention_mask when using gpt2 for the conversation task?

Should we specify attention_mask when using gpt2 for the conversation task?

Open Oran-Ac opened this issue 2 years ago • 1 comments

Here's an example when using 'redial dataset' We do the padding function to make the dialogue in the same length, so that we can deal with batches.

Dataloader process: code

In order to make the gpt2 pay no attention to the pad, should we specify attention_mask when using gpt2 for conversation task?

gpt2 forward process:code
helpful issues: link

Because we calculate the loss just with the response, should the padding labels set to -100 rather than 0(code) so that the model can ignore it?

Mar 03 '22 08:03 Oran-Ac

I think when doing the batch generation, we also need to pass the position ids to the model. Otherwise, the position ids will always be 1 in the loop except for the first round. You can find the discussion [here].(https://github.com/huggingface/transformers/issues/3021#issuecomment-591418233)

https://github.com/RUCAIBox/CRSLab/blob/b3ab262a4ad93cbae98fe66541eb735377768a35/crslab/model/conversation/gpt2/gpt2.py#L98

And why you cut off the context when doing the generation? https://github.com/RUCAIBox/CRSLab/blob/b3ab262a4ad93cbae98fe66541eb735377768a35/crslab/model/conversation/gpt2/gpt2.py#L95

Mar 04 '22 14:03 Oran-Ac

Thanks for your feedback. @Oran-Ac

The padding labels are set to -100 according to your suggestion.
Due to the limitation of gpu memory, we cuf off the context when doing the generation.
The position_ids will be automatically added by Transformers if no parameter is passed.

Aug 22 '22 02:08 txy77

CRSLab CRSLab copied to clipboard

Should we specify attention_mask when using gpt2 for the conversation task?

CRSLab
CRSLab copied to clipboard