DeepSpeedExamples fix ppo_trainer generate and scores calculation in stage 2

A quick fix for bugs I see when go through the code

Wrong scores calculation in step2 reward model training It might related to issue334 https://github.com/microsoft/DeepSpeedExamples/issues/334
Wrongly setting min generation length causes repeated generation It might related to issue318 https://github.com/microsoft/DeepSpeedExamples/issues/318, issue324 https://github.com/microsoft/DeepSpeedExamples/issues/324

Apr 19 '23 02:04 nepetune233

@microsoft-github-policy-service agree

Apr 19 '23 03:04 nepetune233

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

Apr 27 '23 06:04 DanqingZ

@DanqingZ Here is result on my side. I am confused about why your generated seems to be padded in left side.

Apr 27 '23 07:04 nepetune233

could you show me the code how you decode seq?

Apr 27 '23 08:04 DanqingZ

Apr 28 '23 01:04 nepetune233

@nepetune233 would you mind exchanging contact information with me. My email address is [email protected]. You can send me an email. Thank you!

Apr 28 '23 02:04 DanqingZ

@nepetune233 I see, I was using the output of _generate_sequence: https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L68-L93

Apr 28 '23 02:04 DanqingZ

I tried to add break point in the ppo trainer code like this

at first, the output is like this, and the reward is below 0

later on, after several steps, the LLM starts to output repeated token...and the reward is above 0

Apr 28 '23 06:04 DanqingZ

Thanks for all good investigation. We are trying to reproduce these issues now.

Apr 28 '23 16:04 yaozhewei

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

Apr 28 '23 17:04 yaozhewei

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

Yes, it is. Please see the reply to @nepetune233. We are investigating this

Apr 28 '23 17:04 yaozhewei

@yaozhewei Thanks for your detailed explanation! I also read through the data_utils.py again and found the issue.

In the code, we actually do the padding twice. First time, we pad the prompt sequence to the same length. Second time, we pad the prompt sequence to max token length. The padding code for second time is wrong. The origin code:

	batch["prompt"] = F.pad(prompt,
	pad=(pad_length, 0),
	mode='constant',
	value=pad_token_id)

It will create padding on the left side, although we will do flip in the following code. So the code should be

	batch["prompt"] = F.pad(prompt,
	pad=(0, pad_length),
	mode='constant',
	value=pad_token_id)

I have updated the pull request.

Apr 29 '23 02:04 nepetune233

solved by #426 #457 #468.

May 08 '23 07:05 nepetune233

DeepSpeedExamples DeepSpeedExamples copied to clipboard

fix ppo_trainer generate and scores calculation in stage 2

A quick fix for bugs I see when go through the code

DeepSpeedExamples
DeepSpeedExamples copied to clipboard