RL-Chat-pytorch icon indicating copy to clipboard operation
RL-Chat-pytorch copied to clipboard

the reward explode problem

Open Tangzy7 opened this issue 7 years ago • 0 comments

Hi. I just used Ease of answering as the reward in training. However, I found the reward explode from -2.x to -∞ during training, though I already scaled the reward. Have you met such problem?

Tangzy7 avatar Jun 15 '18 15:06 Tangzy7