DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Tokens in multi-turn setting

Open ferdinando17 opened this issue 5 years ago • 12 comments
trafficstars

Hi, thanks for making the work available and for the explanations.

From the paper I understand that a training instance is a dialogue session, made up of several dialogue turns concatenated and ended by the end-of-text token.

Based on this and on what dreasysnail says in Issue #17:

There ARE special tokens (<|endoftext|>, id=50256) between dialogue turns in multi-turn setup. >Your input format should be like this:

Turn1 <|endoftext|> Turn2 <|endoftext|> ... TurnN

my question is:

are the token between different dialogue turns the same as the tokens separating whole dialogue sessions?

Thank you

ferdinando17 avatar Feb 06 '20 10:02 ferdinando17

Hi, thanks for making the work available and for the explanations.

From the paper I understand that a training instance is a dialogue session, made up of several dialogue turns concatenated and ended by the end-of-text token.

Based on this and on what dreasysnail says in Issue #17:

There ARE special tokens (<|endoftext|>, id=50256) between dialogue turns in multi-turn setup. >Your input format should be like this: Turn1 <|endoftext|> Turn2 <|endoftext|> ... TurnN

my question is:

are the token between different dialogue turns the same as the tokens separating whole dialogue sessions?

Thank you

If I understand right, there are NO tokens between dialogue sessions. Because one dialogue session is one training example and contains source (utt1 <|eos|> utt2 <|eos|> utt3) and target (utt4). Next session passed to the model as another training sample.

liehtman avatar Feb 13 '20 10:02 liehtman

Thank you liethman, this is very helpful.

My current, updated understanding is that the .tsv file must be in the format you described, with a \t between the source (utt1 <|eos|> utt2 <|eos|> utt3) and the target (utt4).

Then, the prepro.py will create the features, that end with an <|endoftext|> token (id=50256).

ferdinando17 avatar Feb 13 '20 13:02 ferdinando17

  • here, interested too

GraphGrailAi avatar Apr 05 '20 09:04 GraphGrailAi

I successfully managed to fine-tune the model with input data in this form: each line of the .tsv file is a dialogue, with each turn separated by <|eos|> and a tab that separates the target from the rest of the dialogue.

A sample training instance is therefore : utt1 <|eos|> utt2 <|eos|> utt3 \t target \n

ferdinando17 avatar Apr 06 '20 15:04 ferdinando17

Hi, @ferdinando17 . I am trying to fine-tune the model with my own dataset. I failed to run python demo.py --data small so that I can't know the exact format of the .tsv file. After reading some codes, I agree with your opinion. Could you please help me confirm if the format of my data set(.tsv file) is correct:

0.0 utt1 EOS 1.0 utt2 EOS 1.0 utt3 \t 1.0 i am a admin .\n

Hope to get your reply. Thanks.

LooperXX avatar Apr 28 '20 16:04 LooperXX

Hi, you are missing the tab, it should be "0.0 utt1 0.0 EOS utt2 0.0 EOS utt3 \t 1.0 i am a admin .\n"

to ask dialoGPT to predict "i am a admin. " Look at my example.

Also, the zeros mean you are not training on all the utterances that follow them, is it what you want?

ferdinando17 avatar Apr 28 '20 19:04 ferdinando17

Hi, @ferdinando17 , this is what bothers me. In multi-turn dialog, we have several previous turns as context, one user turn as the question and one system turn as the answer. Through your explanation, I realized that it should be

0.0 utt1 EOS 1.0 utt2 EOS 1.0 utt3 \t 1.0 i am a admin .\n

as the example in the training/fine-tuning dataset, where only the first sentence should be 0.0 and the remaining sentences should be 1.0 to train/fine-tune the model regardless of the user turn or the system turn. (Actually I am confused that should I distinguish between user and system turns: 0.0 to user turn and 1.0 to system turn, so that the model only need to predict each system turn. Because the model just need to predict the system utterance in the evaluation. But maybe all 1.0 will help train the model with more data.) Is that correct? Hope to get your reply. Thanks. 🙏

LooperXX avatar Apr 29 '20 01:04 LooperXX

Are you applying it to task-oriented dialogue?

I understand that the 0.0 are for those sentences that you want to filter, the authors used it to avoid training on offensive language. I used all 1.0 and my training instances where of the form I specified, where target was always a system turn.

I hope it makes sense.

ferdinando17 avatar May 05 '20 20:05 ferdinando17

Hi, @ferdinando17. Thank you for your reply. Yes, I am trying to apply it to the task-oriented dialogue. In my understanding, I think 0.0 would make the model not make predictions about this sentence, and 1.0 would make the model make predictions about this sentence. So I think it is ok to train the model by making the first sentence of each multi-turn dialog 0.0, as the context information, and making the rest of the statements 1.0. Also, we can make every user turn 0.0 and each system turns 1.0. Maybe more experiments about two different settings are needed. Thanks again for your reply.

LooperXX avatar May 06 '20 03:05 LooperXX

Ok, I see. I disagree, but of course I might be wrong. In this issue, another user says 0.0 causes the sentences to be ignored in the training. They refer to the hugginface docs too.

Let me know if you find evidence of the contrary.

ferdinando17 avatar May 06 '20 10:05 ferdinando17

hi gays,how do i deal with datasets like this : person1: utt1, person2 : utt2, person1: utt3 ... by refering what you all says i think it should like this: 1.0 utt1 EOS 1.0 utt2 EOS \t 1.0 utt3

is this correct?

minmummax avatar Oct 09 '22 09:10 minmummax

also i just wonder what should the validation set looks like

minmummax avatar Oct 09 '22 10:10 minmummax