Youtube-Code-Repository
Youtube-Code-Repository copied to clipboard
ValueError: The parameter loc has invalid values
I've downloaded your code and made the following small changes:
-removed all loading/checkpointing/saving functions/calls
-switched the gym environment to env = gym.make("InvertedPendulum-v2")
After some training (variable amount of time before error occurs) I get the following bug:
File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values
I print out the mu and sigma and see that immediately before the error they have become equal to nan:
tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>)
(This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)
Thanks again for the quick reply in your video!
In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)
In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)
Edit : tanh(actions)*max_actions may be superior to 1, which in return could mean that we get NaN
Thanks @otouat, I just had the same problem with tensor([[nan]]), and here is a big THANK after two years !