trl icon indicating copy to clipboard operation
trl copied to clipboard

Loss suddenly increase extremely high in only one step in sentiment notebook

Open alexixu opened this issue 2 years ago • 3 comments

The sentiment ppo training is normal and the loss is decreasing. The reward mean is increasing slowly.

In 40th step, the loss suddenly increase to 1e+10,which cause the reward decrease a lot.

I want to know why the peak is happened.

Every thing is not changed compare to the sentiment example.

截屏2023-03-08 20 28 28

alexixu avatar Mar 08 '23 12:03 alexixu

WE have experienced this a few times when the generation is very short (only 1-2 tokens). One way to force the model to always generate tokens is to set the eos_token_id=-1 as done in the T5 example:

generation_kwargs = {"top_k": 0.0, "top_p": 1.0, "do_sample": True, "eos_token_id": -1}

lvwerra avatar Mar 09 '23 13:03 lvwerra

Thanks! @lvwerra

The generation_kwargs has min_length parameter generation_kwargs = { "min_length": 40} This setting will led to wrong KL value, Right?

Another question is why short generated text cause loss compute anomaly.

alexixu avatar Mar 10 '23 03:03 alexixu

Haven't had time to investigate this, yet, but it's tracked in #101. Yes, the min_length can lead to negative KL. The difference between this and the eos_token_id=-1 is that with the min_lengh the eos token is suppressed while in the latter it can be generated but the generation just continues.

lvwerra avatar Mar 13 '23 18:03 lvwerra

Closing for now, feel free to re-open if there's an update.

lvwerra avatar Apr 14 '23 08:04 lvwerra