reinforcement-learning Catastrophic collapse in episode score on cartpole

Hi, First of all I just want to say awesome work on the library overall, really love the concept :+1:

I have an issue where cartpole_a3c will converge relatively quickly (around ep 300-400). Then keep doing well, and then suddenly collapsing and not recovering. Has anyone else experienced this?

Jul 05 '17 07:07 erlendaxpo

Thank you for your compliment.

Now the reward which is given to agent when the episode is over before 500 timestep is -100.

I think this is too big so it can influence network's stability. I am curious about the result when change the reward from -100 to -10 or -1.

Jul 05 '17 08:07 dnddnjs

there could be many reasons behind catastrophic collapse: learning rate; gamma rate, which is the discount rate applied to rewards; etc.

one common solution is gradient clipping. by clipping gradient vectors, you minimize the impact of high variance situations (eg a -100 reward after a series of +1 rewards).

Jan 19 '18 16:01 mynameisvinn

Catastrophic collapse in episode score on cartpole_a3c