reinforcement-learning
reinforcement-learning copied to clipboard
Catastrophic collapse in episode score on cartpole_a3c
Hi, First of all I just want to say awesome work on the library overall, really love the concept :+1:
I have an issue where cartpole_a3c will converge relatively quickly (around ep 300-400). Then keep doing well, and then suddenly collapsing and not recovering. Has anyone else experienced this?
Thank you for your compliment.
Now the reward which is given to agent when the episode is over before 500 timestep is -100.
I think this is too big so it can influence network's stability. I am curious about the result when change the reward from -100 to -10 or -1.
there could be many reasons behind catastrophic collapse: learning rate; gamma rate, which is the discount rate applied to rewards; etc.
one common solution is gradient clipping. by clipping gradient vectors, you minimize the impact of high variance situations (eg a -100 reward after a series of +1 rewards).