RL-Chat-pytorch
RL-Chat-pytorch copied to clipboard
the reward explode problem
Hi. I just used Ease of answering as the reward in training. However, I found the reward explode from -2.x to -∞ during training, though I already scaled the reward. Have you met such problem?