DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

fix ppo_trainer generate and scores calculation in stage 2

Open nepetune233 opened this issue 2 years ago • 1 comments

A quick fix for bugs I see when go through the code

  1. Wrong scores calculation in step2 reward model training It might related to issue334 https://github.com/microsoft/DeepSpeedExamples/issues/334
  2. Wrongly setting min generation length causes repeated generation It might related to issue318 https://github.com/microsoft/DeepSpeedExamples/issues/318, issue324 https://github.com/microsoft/DeepSpeedExamples/issues/324

nepetune233 avatar Apr 19 '23 02:04 nepetune233

@microsoft-github-policy-service agree

nepetune233 avatar Apr 19 '23 03:04 nepetune233

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

DanqingZ avatar Apr 27 '23 06:04 DanqingZ

@DanqingZ image Here is result on my side. I am confused about why your generated seems to be padded in left side.

nepetune233 avatar Apr 27 '23 07:04 nepetune233

could you show me the code how you decode seq?

DanqingZ avatar Apr 27 '23 08:04 DanqingZ

image

nepetune233 avatar Apr 28 '23 01:04 nepetune233

@nepetune233 would you mind exchanging contact information with me. My email address is [email protected]. You can send me an email. Thank you!

DanqingZ avatar Apr 28 '23 02:04 DanqingZ

@nepetune233 I see, I was using the output of _generate_sequence: https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L68-L93

DanqingZ avatar Apr 28 '23 02:04 DanqingZ

I tried to add break point in the ppo trainer code like this image

at first, the output is like this, and the reward is below 0

image

later on, after several steps, the LLM starts to output repeated token...and the reward is above 0

image

DanqingZ avatar Apr 28 '23 06:04 DanqingZ

Thanks for all good investigation. We are trying to reproduce these issues now.

yaozhewei avatar Apr 28 '23 16:04 yaozhewei

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

yaozhewei avatar Apr 28 '23 17:04 yaozhewei

I also tried to add an break point here https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L97

and I have

(Pdb) tokenizer.batch_decode(seq[0:1,:])
['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>\n\nHuman: Why is the Book of Enoch not always included in official Bible canon?\n\nAssistant: The Book of Enoch is one of the most popularly quoted books in the Bible. The problem is, it isn’t considered canonical by the main branch of Christianity, and also isn’t widely accepted in the Jewish tradition. It’s an ancient text, but most scholars believe its contents are unreliable. It’s considered a scripture, but not a part of the accepted biblical canon.\n\nHuman: Who originally wrote the book of Enoch?\n\nAssistant:<|endoftext|>\r��\r��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������']

is this what we expected? @yaozhewei @nepetune233

Yes, it is. Please see the reply to @nepetune233. We are investigating this

yaozhewei avatar Apr 28 '23 17:04 yaozhewei

@yaozhewei Thanks for your detailed explanation! I also read through the data_utils.py again and found the issue.

In the code, we actually do the padding twice. First time, we pad the prompt sequence to the same length. Second time, we pad the prompt sequence to max token length. The padding code for second time is wrong. The origin code:

	batch["prompt"] = F.pad(prompt,
	pad=(pad_length, 0),
	mode='constant',
	value=pad_token_id)

It will create padding on the left side, although we will do flip in the following code. So the code should be

	batch["prompt"] = F.pad(prompt,
	pad=(0, pad_length),
	mode='constant',
	value=pad_token_id)

I have updated the pull request.

nepetune233 avatar Apr 29 '23 02:04 nepetune233

solved by #426 #457 #468.

nepetune233 avatar May 08 '23 07:05 nepetune233