In instructGPT, during the RM training process, different <prompt, response> pairs of a prompt are put together to calculate the loss. Is this also implemented in DeepSpeed-chat?

Open BaiStone2017 opened this issue 2 years ago • 0 comments

Apr 17 '23 01:04 BaiStone2017