Youtube-Code-Repository icon indicating copy to clipboard operation
Youtube-Code-Repository copied to clipboard

ValueError: The parameter loc has invalid values

Open johnschwarcz opened this issue 2 years ago • 3 comments

I've downloaded your code and made the following small changes: -removed all loading/checkpointing/saving functions/calls -switched the gym environment to env = gym.make("InvertedPendulum-v2")

After some training (variable amount of time before error occurs) I get the following bug: File "C:\Users\john\Desktop\project\Clone\sac_torch.py", line 32, in choose_action actions, _ = self.actor.sample_normal(state, reparameterize=False) File "C:\Users\john\Desktop\project\Clone\networks.py", line 105, in sample_normal probabilities = Normal(mu, sigma) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\john\anaconda3\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__ raise ValueError("The parameter {} has invalid values".format(param)) ValueError: The parameter loc has invalid values

I print out the mu and sigma and see that immediately before the error they have become equal to nan: tensor([[nan]], device='cuda:0', grad_fn=<AddmmBackward>) tensor([[nan]], device='cuda:0', grad_fn=<ClampBackward1>) (This appears to be occurring during a forward pass, not buffer sampling, due to the tensor being 1 dimensional)

Thanks again for the quick reply in your video!

johnschwarcz avatar Aug 22 '21 18:08 johnschwarcz

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

otouat avatar Sep 04 '21 17:09 otouat

In the sample_normal method from ActorNetwork, we compute log_probs using log(1-actions+epsilon), however, action was defined before to be equal to tanh(actions)*max_actions, which may be superior to 0. By computing log_probs with tanh(actions) instead of tanh(actions)*max_actions, it may work ( it solves the same issue you have in my case using the Pendulum-v0 env)

Edit : tanh(actions)*max_actions may be superior to 1, which in return could mean that we get NaN

otouat avatar Sep 04 '21 17:09 otouat

Thanks @otouat, I just had the same problem with tensor([[nan]]), and here is a big THANK after two years !

peter890331 avatar Sep 19 '23 13:09 peter890331