deep-rl-class icon indicating copy to clipboard operation
deep-rl-class copied to clipboard

[HANDS-ON BUG] Unit 7

Open amostof opened this issue 9 months ago • 0 comments

Describe the bug

Running the shared code on my laptop without modifying any of the hyperparameters results in the error message, probability tensor contains either inf, nan or element < 0

Material

  • Did you use Google Colab? No

If not:

  • Your Operating system (OS) MacOS
  • Version of your OS 15.3.1
(rl) amir@Laptop-Amir ml-agents % mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos/SoccerTwos.app --run-id="SoccerTwos-t1" --no-graphics     

            ┐  ╖
        ╓╖╬│╡  ││╬╖╖
    ╓╖╬│││││┘  ╬│││││╬╖
 ╖╬│││││╬╜        ╙╬│││││╖╖                               ╗╗╗
 ╬╬╬╬╖││╦╖        ╖╬││╗╣╣╣╬      ╟╣╣╬    ╟╣╣╣             ╜╜╜  ╟╣╣
 ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╒╣╣╖╗╣╣╣╗   ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖   ╣╣╣
 ╬╬╬╬┐  ╙╬╬╬╬│╓╣╣╣╝╜  ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣  ╣╣╣ ╙╟╣╣╜╙  ╫╣╣  ╟╣╣
 ╬╬╬╬┐     ╙╬╬╣╣      ╫╣╣╣╬      ╟╣╣╬    ╟╣╣╣ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣     ╣╣╣┌╣╣╜
 ╬╬╬╜       ╬╬╣╣      ╙╝╣╣╬      ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬   ╣╣╣  ╣╣╣  ╟╣╣╦╓    ╣╣╣╣╣
 ╙   ╓╦╖    ╬╬╣╣   ╓╗╗╖            ╙╝╣╣╣╣╝╜   ╘╝╝╜   ╝╝╝  ╝╝╝   ╙╣╣╣    ╟╣╣╣
   ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝                                             ╫╣╣╣╣
      ╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
          ╙╬╬╬╣╣╣╜
             ╙
        
 Version information:
  ml-agents: 1.2.0.dev0,
  ml-agents-envs: 1.2.0.dev0,
  Communicator API: 1.5.0,
  PyTorch: 2.6.0
[INFO] Connected to Unity environment with package version 2.3.0-exp.3 and communication version 1.5.0
[INFO] Connected new brain: SoccerTwos?team=1
[INFO] Connected new brain: SoccerTwos?team=0
[INFO] Hyperparameters for behavior name SoccerTwos: 
        trainer_type:   poca
        hyperparameters:
          batch_size:   2048
          buffer_size:  20480
          learning_rate:        0.0003
          beta: 0.005
          epsilon:      0.2
          lambd:        0.95
          num_epoch:    3
          learning_rate_schedule:       constant
          beta_schedule:        constant
          epsilon_schedule:     constant
        checkpoint_interval:    500000
        network_settings:
          normalize:    False
          hidden_units: 512
          num_layers:   2
          vis_encode_type:      simple
          memory:       None
          goal_conditioning_type:       hyper
          deterministic:        False
        reward_signals:
          extrinsic:
            gamma:      0.99
            strength:   1.0
            network_settings:
              normalize:        False
              hidden_units:     128
              num_layers:       2
              vis_encode_type:  simple
              memory:   None
              goal_conditioning_type:   hyper
              deterministic:    False
        init_path:      None
        keep_checkpoints:       5
        even_checkpoints:       False
        max_steps:      50000000
        time_horizon:   1000
        summary_freq:   10000
        threaded:       False
        self_play:
          save_steps:   50000
          team_change:  200000
          swap_steps:   2000
          window:       10
          play_against_latest_model_ratio:      0.5
          initial_elo:  1200.0
        behavioral_cloning:     None
/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3729.)
  torch.nn.functional.one_hot(_act.T, action_size[i]).float()
[INFO] SoccerTwos. Step: 10000. Time Elapsed: 47.233 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1199.251.
[INFO] SoccerTwos. Step: 20000. Time Elapsed: 84.973 s. Mean Reward: 0.000. Mean Group Reward: -0.170. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 30000. Time Elapsed: 107.770 s. Mean Reward: 0.000. Mean Group Reward: 0.079. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 40000. Time Elapsed: 153.743 s. Mean Reward: 0.000. Mean Group Reward: -0.167. Training. ELO: 1198.256.
[INFO] SoccerTwos. Step: 50000. Time Elapsed: 199.639 s. Mean Reward: 0.000. Mean Group Reward: 0.214. Training. ELO: 1197.823.
[INFO] SoccerTwos. Step: 60000. Time Elapsed: 225.712 s. Mean Reward: 0.000. Mean Group Reward: -0.159. Training. ELO: 1197.535.
[INFO] SoccerTwos. Step: 70000. Time Elapsed: 277.827 s. Mean Reward: 0.000. Mean Group Reward: -0.122. Training. ELO: 1196.114.
[INFO] SoccerTwos. Step: 80000. Time Elapsed: 298.038 s. Mean Reward: 0.000. Mean Group Reward: -0.086. Training. ELO: 1196.068.
[INFO] SoccerTwos. Step: 90000. Time Elapsed: 349.539 s. Mean Reward: 0.000. Mean Group Reward: -0.019. Training. ELO: 1197.252.
[INFO] SoccerTwos. Step: 100000. Time Elapsed: 388.232 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1200.413.
[INFO] SoccerTwos. Step: 110000. Time Elapsed: 420.512 s. Mean Reward: 0.000. Mean Group Reward: 0.143. Training. ELO: 1202.923.
[INFO] SoccerTwos. Step: 120000. Time Elapsed: 461.744 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1205.046.
[INFO] SoccerTwos. Step: 130000. Time Elapsed: 494.701 s. Mean Reward: 0.000. Mean Group Reward: -0.117. Training. ELO: 1205.951.
[INFO] SoccerTwos. Step: 140000. Time Elapsed: 527.693 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training.
[INFO] SoccerTwos. Step: 150000. Time Elapsed: 561.406 s. Mean Reward: 0.000. Mean Group Reward: -0.003. Training. ELO: 1206.395.
[INFO] SoccerTwos. Step: 160000. Time Elapsed: 591.341 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training.
[INFO] SoccerTwos. Step: 170000. Time Elapsed: 624.032 s. Mean Reward: 0.000. Mean Group Reward: -0.016. Training. ELO: 1206.998.
[INFO] SoccerTwos. Step: 180000. Time Elapsed: 674.598 s. Mean Reward: 0.000. Mean Group Reward: -0.042. Training. ELO: 1207.747.
[INFO] SoccerTwos. Step: 190000. Time Elapsed: 696.535 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 200000. Time Elapsed: 735.584 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 210000. Time Elapsed: 781.218 s. Mean Reward: 0.000. Mean Group Reward: 0.026. Training. ELO: 1207.007.
Traceback (most recent call last):
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 175, in start_learning
    n_steps = self.advance(env_manager)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 233, in advance
    new_step_infos = env_manager.get_steps()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 124, in get_steps
    new_step_infos = self._step()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 408, in _step
    self._queue_steps()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 302, in _queue_steps
    env_action_info = self._take_step(env_worker.previous_step)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 543, in _take_step
    all_action_info[brain_name] = self.policies[brain_name].get_action(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 130, in get_action
    run_out = self.evaluate(decision_requests, global_agent_ids)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 100, in evaluate
    action, run_out, memories = self.actor.get_action_and_stats(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 640, in get_action_and_stats
    action, log_probs, entropies = self.action_model(encoding, masks)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 227, in forward
    actions = self._sample_action(dists)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 96, in _sample_action
    discrete_action.append(discrete_dist.sample())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/amir/anaconda3/envs/rl/bin/mlagents-learn", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 270, in main
    run_cli(parse_command_line())
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 266, in run_cli
    run_training(run_seed, options, num_areas)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 138, in run_training
    tc.start_learning(env_manager)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 200, in start_learning
    self._save_models()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 80, in _save_models
    self.trainers[brain_name].save_model()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/ghost/trainer.py", line 334, in save_model
    self.trainer.save_model()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 172, in save_model
    model_checkpoint = self._checkpoint()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 144, in _checkpoint
    export_path, auxillary_paths = self.model_saver.save_checkpoint(
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 60, in save_checkpoint
    self.export(checkpoint_path, behavior_name)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 65, in export
    self.exporter.export_policy_model(output_filepath)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/model_serialization.py", line 164, in export_policy_model
    torch.onnx.export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/__init__.py", line 383, in export
    export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 495, in export
    _export(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1428, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1053, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 937, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 844, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 1498, in _get_trace_graph
    outs = ONNXTracedModule(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
    graph, _out = torch._C._create_graph_by_tracing(
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 692, in forward
    ) = self.action_model.get_action_out(encoding, masks)
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 191, in get_action_out
    discrete_out_list = [
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 192, in <listcomp>
    discrete_dist.exported_model_output()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 149, in exported_model_output
    return self.sample()
  File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
    return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

amostof avatar Feb 27 '25 21:02 amostof