walk-these-ways icon indicating copy to clipboard operation
walk-these-ways copied to clipboard

ValueError invalid values

Open willxxy opened this issue 1 year ago • 1 comments

I get the following error when I run train.py script.

  File "scripts/train.py", line 256, in <module>
    train_go1(headless=False)
  File "scripts/train.py", line 216, in train_go1
    runner.learn(num_learning_iterations=100000, init_at_random_ep_len=True, eval_freq=100)
  File "/data/william/walk-these-ways/go1_gym_learn/ppo_cse/__init__.py", line 204, in learn
    mean_value_loss, mean_surrogate_loss, mean_adaptation_module_loss, mean_decoder_loss, mean_decoder_loss_student, mean_adaptation_module_test_loss, mean_decoder_test_loss, mean_decoder_test_loss_student = self.alg.update()
  File "/data/william/walk-these-ways/go1_gym_learn/ppo_cse/ppo.py", line 110, in update
    self.actor_critic.act(obs_history_batch, masks=masks_batch)
  File "/data/william/walk-these-ways/go1_gym_learn/ppo_cse/actor_critic.py", line 119, in act
    self.update_distribution(observation_history)
  File "/data/william/walk-these-ways/go1_gym_learn/ppo_cse/actor_critic.py", line 116, in update_distribution
    self.distribution = Normal(mean, mean * 0. + self.std)
  File "/home/william/anaconda3/envs/rob/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/william/anaconda3/envs/rob/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (24000, 12)) of distribution Normal(loc: torch.Size([24000, 12]), scale: torch.Size([24000, 12])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<AddmmBackward0>)

willxxy avatar Sep 04 '23 17:09 willxxy

Hi @willxxy ,

Sorry to leave this issue for such a long time. We've recently noticed a similar issue with some newer versions of PyTorch. If you still encounter this error (or any future user comes along this post), can you please confirm your PyTorch version and try running the script in an environment with torch==1.10.0+cu113?

gmargo11 avatar Dec 29 '23 21:12 gmargo11