deep-rl-class
deep-rl-class copied to clipboard
[HANDS-ON BUG] Unit 7
Describe the bug
Running the shared code on my laptop without modifying any of the hyperparameters results in the error message, probability tensor contains either inf, nan or element < 0
Material
- Did you use Google Colab? No
If not:
- Your Operating system (OS) MacOS
- Version of your OS 15.3.1
(rl) amir@Laptop-Amir ml-agents % mlagents-learn ./config/poca/SoccerTwos.yaml --env=./training-envs-executables/SoccerTwos/SoccerTwos.app --run-id="SoccerTwos-t1" --no-graphics
┐ ╖
╓╖╬│╡ ││╬╖╖
╓╖╬│││││┘ ╬│││││╬╖
╖╬│││││╬╜ ╙╬│││││╖╖ ╗╗╗
╬╬╬╬╖││╦╖ ╖╬││╗╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╜╜╜ ╟╣╣
╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╒╣╣╖╗╣╣╣╗ ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖ ╣╣╣
╬╬╬╬┐ ╙╬╬╬╬│╓╣╣╣╝╜ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣ ╣╣╣ ╙╟╣╣╜╙ ╫╣╣ ╟╣╣
╬╬╬╬┐ ╙╬╬╣╣ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣ ╣╣╣┌╣╣╜
╬╬╬╜ ╬╬╣╣ ╙╝╣╣╬ ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣╦╓ ╣╣╣╣╣
╙ ╓╦╖ ╬╬╣╣ ╓╗╗╖ ╙╝╣╣╣╣╝╜ ╘╝╝╜ ╝╝╝ ╝╝╝ ╙╣╣╣ ╟╣╣╣
╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝ ╫╣╣╣╣
╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
╙╬╬╬╣╣╣╜
╙
Version information:
ml-agents: 1.2.0.dev0,
ml-agents-envs: 1.2.0.dev0,
Communicator API: 1.5.0,
PyTorch: 2.6.0
[INFO] Connected to Unity environment with package version 2.3.0-exp.3 and communication version 1.5.0
[INFO] Connected new brain: SoccerTwos?team=1
[INFO] Connected new brain: SoccerTwos?team=0
[INFO] Hyperparameters for behavior name SoccerTwos:
trainer_type: poca
hyperparameters:
batch_size: 2048
buffer_size: 20480
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: constant
beta_schedule: constant
epsilon_schedule: constant
checkpoint_interval: 500000
network_settings:
normalize: False
hidden_units: 512
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
init_path: None
keep_checkpoints: 5
even_checkpoints: False
max_steps: 50000000
time_horizon: 1000
summary_freq: 10000
threaded: False
self_play:
save_steps: 50000
team_change: 200000
swap_steps: 2000
window: 10
play_against_latest_model_ratio: 0.5
initial_elo: 1200.0
behavioral_cloning: None
/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3729.)
torch.nn.functional.one_hot(_act.T, action_size[i]).float()
[INFO] SoccerTwos. Step: 10000. Time Elapsed: 47.233 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1199.251.
[INFO] SoccerTwos. Step: 20000. Time Elapsed: 84.973 s. Mean Reward: 0.000. Mean Group Reward: -0.170. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 30000. Time Elapsed: 107.770 s. Mean Reward: 0.000. Mean Group Reward: 0.079. Training. ELO: 1199.001.
[INFO] SoccerTwos. Step: 40000. Time Elapsed: 153.743 s. Mean Reward: 0.000. Mean Group Reward: -0.167. Training. ELO: 1198.256.
[INFO] SoccerTwos. Step: 50000. Time Elapsed: 199.639 s. Mean Reward: 0.000. Mean Group Reward: 0.214. Training. ELO: 1197.823.
[INFO] SoccerTwos. Step: 60000. Time Elapsed: 225.712 s. Mean Reward: 0.000. Mean Group Reward: -0.159. Training. ELO: 1197.535.
[INFO] SoccerTwos. Step: 70000. Time Elapsed: 277.827 s. Mean Reward: 0.000. Mean Group Reward: -0.122. Training. ELO: 1196.114.
[INFO] SoccerTwos. Step: 80000. Time Elapsed: 298.038 s. Mean Reward: 0.000. Mean Group Reward: -0.086. Training. ELO: 1196.068.
[INFO] SoccerTwos. Step: 90000. Time Elapsed: 349.539 s. Mean Reward: 0.000. Mean Group Reward: -0.019. Training. ELO: 1197.252.
[INFO] SoccerTwos. Step: 100000. Time Elapsed: 388.232 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1200.413.
[INFO] SoccerTwos. Step: 110000. Time Elapsed: 420.512 s. Mean Reward: 0.000. Mean Group Reward: 0.143. Training. ELO: 1202.923.
[INFO] SoccerTwos. Step: 120000. Time Elapsed: 461.744 s. Mean Reward: 0.000. Mean Group Reward: 0.281. Training. ELO: 1205.046.
[INFO] SoccerTwos. Step: 130000. Time Elapsed: 494.701 s. Mean Reward: 0.000. Mean Group Reward: -0.117. Training. ELO: 1205.951.
[INFO] SoccerTwos. Step: 140000. Time Elapsed: 527.693 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training.
[INFO] SoccerTwos. Step: 150000. Time Elapsed: 561.406 s. Mean Reward: 0.000. Mean Group Reward: -0.003. Training. ELO: 1206.395.
[INFO] SoccerTwos. Step: 160000. Time Elapsed: 591.341 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training.
[INFO] SoccerTwos. Step: 170000. Time Elapsed: 624.032 s. Mean Reward: 0.000. Mean Group Reward: -0.016. Training. ELO: 1206.998.
[INFO] SoccerTwos. Step: 180000. Time Elapsed: 674.598 s. Mean Reward: 0.000. Mean Group Reward: -0.042. Training. ELO: 1207.747.
[INFO] SoccerTwos. Step: 190000. Time Elapsed: 696.535 s. Mean Reward: 0.000. Mean Group Reward: -0.200. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 200000. Time Elapsed: 735.584 s. Mean Reward: 0.000. Mean Group Reward: 0.000. Training. ELO: 1207.996.
[INFO] SoccerTwos. Step: 210000. Time Elapsed: 781.218 s. Mean Reward: 0.000. Mean Group Reward: 0.026. Training. ELO: 1207.007.
Traceback (most recent call last):
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 175, in start_learning
n_steps = self.advance(env_manager)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 233, in advance
new_step_infos = env_manager.get_steps()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 124, in get_steps
new_step_infos = self._step()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 408, in _step
self._queue_steps()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 302, in _queue_steps
env_action_info = self._take_step(env_worker.previous_step)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 543, in _take_step
all_action_info[brain_name] = self.policies[brain_name].get_action(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 130, in get_action
run_out = self.evaluate(decision_requests, global_agent_ids)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 100, in evaluate
action, run_out, memories = self.actor.get_action_and_stats(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 640, in get_action_and_stats
action, log_probs, entropies = self.action_model(encoding, masks)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 227, in forward
actions = self._sample_action(dists)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 96, in _sample_action
discrete_action.append(discrete_dist.sample())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/amir/anaconda3/envs/rl/bin/mlagents-learn", line 33, in <module>
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 270, in main
run_cli(parse_command_line())
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 266, in run_cli
run_training(run_seed, options, num_areas)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/learn.py", line 138, in run_training
tc.start_learning(env_manager)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 200, in start_learning
self._save_models()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 80, in _save_models
self.trainers[brain_name].save_model()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/ghost/trainer.py", line 334, in save_model
self.trainer.save_model()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 172, in save_model
model_checkpoint = self._checkpoint()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/trainer/rl_trainer.py", line 144, in _checkpoint
export_path, auxillary_paths = self.model_saver.save_checkpoint(
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 60, in save_checkpoint
self.export(checkpoint_path, behavior_name)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py", line 65, in export
self.exporter.export_policy_model(output_filepath)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/model_serialization.py", line 164, in export_policy_model
torch.onnx.export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/__init__.py", line 383, in export
export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 495, in export
_export(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1428, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 1053, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 937, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/onnx/utils.py", line 844, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 1498, in _get_trace_graph
outs = ONNXTracedModule(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 138, in forward
graph, _out = torch._C._create_graph_by_tracing(
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(*trace_inputs))
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/amir/anaconda3/envs/rl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in _slow_forward
result = self.forward(*input, **kwargs)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 692, in forward
) = self.action_model.get_action_out(encoding, masks)
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 191, in get_action_out
discrete_out_list = [
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/action_model.py", line 192, in <listcomp>
discrete_dist.exported_model_output()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 149, in exported_model_output
return self.sample()
File "/Users/amir/Documents/DeepRL/ml-agents/ml-agents/mlagents/trainers/torch_entities/distributions.py", line 124, in sample
return torch.multinomial(self.probs, 1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0