Deep-Reinforcement-Learning-Algorithms-with-PyTorch Getting Nan as reward in training in PPO

trafficstars

Hi,

I ran the provided results code for mountain car. After some number of episodes the reward returned starts being Nan and training essentially stops. In the final graph the line just stops at this point.

I'm not really sure what is causing this as the reward up to this point are quite normal and the agent is approaching solving the environment. I've tried following through the code to see if anything is obviously causing this issue but nothing is jumping out.

I was just wondering if you had any experience with this type of issue when running these agents before.

Thanks.

mountaincar

Jan 22 '19 11:01 JohnBurden

Hi!

I see, the NaN rewards might be because the environment is returning super large rewards which i think can happen in this environment if the agent starts choosing really big numbers as its actions.

I can't seem to replicate it on my computer though. Does it happen every time or just sometimes? And does the random seed impact it?

One thing that might stop it happening is if you change this line in the "Policy_Gradient_Agents" hyperparameter description in the file Results/Mountain_Car_Continuous/Results.py:

from

"final_layer_activation": None

to

"final_layer_activation": "TANH"

Let me know if that works?

Jan 22 '19 13:01 p-christ

Hi,

Thanks for such a quick response. From some playing around I can tell that the seed doesn't seem to have an effect. It happens every time.

I managed to sort the issue out by clipping the action to be in some finite range which does indeed suggest that it is due to the agent picking very high action values and then receiving a very large negative reward for doing so. Interestingly, changing the final layer activation to TANH doesn't seem to prevent this issue even though it should essentially clip the actions to be between -1 and 1.

Jan 22 '19 14:01 JohnBurden

Deep-Reinforcement-Learning-Algorithms-with-PyTorch Deep-Reinforcement-Learning-Algorithms-with-PyTorch copied to clipboard

Getting Nan as reward in training in PPO

Deep-Reinforcement-Learning-Algorithms-with-PyTorch
Deep-Reinforcement-Learning-Algorithms-with-PyTorch copied to clipboard