Loss suddenly increase extremely high in only one step in sentiment notebook
The sentiment ppo training is normal and the loss is decreasing. The reward mean is increasing slowly.
In 40th step, the loss suddenly increase to 1e+10,which cause the reward decrease a lot.
I want to know why the peak is happened.
Every thing is not changed compare to the sentiment example.

WE have experienced this a few times when the generation is very short (only 1-2 tokens). One way to force the model to always generate tokens is to set the eos_token_id=-1 as done in the T5 example:
generation_kwargs = {"top_k": 0.0, "top_p": 1.0, "do_sample": True, "eos_token_id": -1}
Thanks! @lvwerra
The generation_kwargs has min_length parameter
generation_kwargs = { "min_length": 40}
This setting will led to wrong KL value, Right?
Another question is why short generated text cause loss compute anomaly.
Haven't had time to investigate this, yet, but it's tracked in #101. Yes, the min_length can lead to negative KL. The difference between this and the eos_token_id=-1 is that with the min_lengh the eos token is suppressed while in the latter it can be generated but the generation just continues.
Closing for now, feel free to re-open if there's an update.