DeepSpeedExamples
DeepSpeedExamples copied to clipboard
In instructGPT, during the RM training process, different <prompt, response> pairs of a prompt are put together to calculate the loss. Is this also implemented in DeepSpeed-chat?