trl Loss suddenly increase extremely high in only one step in sentiment notebook

The sentiment ppo training is normal and the loss is decreasing. The reward mean is increasing slowly.

In 40th step, the loss suddenly increase to 1e+10，which cause the reward decrease a lot.

I want to know why the peak is happened.

Every thing is not changed compare to the sentiment example.

截屏2023-03-08 20 28 28

Mar 08 '23 12:03 alexixu

WE have experienced this a few times when the generation is very short (only 1-2 tokens). One way to force the model to always generate tokens is to set the eos_token_id=-1 as done in the T5 example:

generation_kwargs = {"top_k": 0.0, "top_p": 1.0, "do_sample": True, "eos_token_id": -1}

Mar 09 '23 13:03 lvwerra

Thanks! @lvwerra

The generation_kwargs has min_length parameter generation_kwargs = { "min_length": 40} This setting will led to wrong KL value, Right?

Another question is why short generated text cause loss compute anomaly.

Mar 10 '23 03:03 alexixu

Haven't had time to investigate this, yet, but it's tracked in #101. Yes, the min_length can lead to negative KL. The difference between this and the eos_token_id=-1 is that with the min_lengh the eos token is suppressed while in the latter it can be generated but the generation just continues.

Mar 13 '23 18:03 lvwerra

Closing for now, feel free to re-open if there's an update.

Apr 14 '23 08:04 lvwerra