agents
agents copied to clipboard
PPO Agent with masked actions
Hello,
when I'm using PPOAgent with masked actions, I wrap the actor network with MaskSplitterNetwork
and the value network as well.
However, when trying to train the agent, I get this error message:
File "/tf/ppo_example/ppo.py", line 335, in <module>
multiprocessing.handle_main(functools.partial(app.run, main))
File "/usr/local/lib/python3.8/dist-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main
return app.run(parent_main_fn, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/tf/WORKING DQN/ppo.py", line 322, in main
train_eval(
File "/usr/local/lib/python3.8/dist-packages/gin/config.py", line 1605, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/usr/local/lib/python3.8/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.__traceback__) from None
File "/usr/local/lib/python3.8/dist-packages/gin/config.py", line 1582, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "/tf/WORKING DQN/ppo.py", line 275, in train_eval
total_loss, _ = train_step()
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tf/WORKING DQN/ppo.py", line 245, in train_step
return tf_agent.train(experience=trajectories)
File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 336, in train
loss_info = self._train_fn(
File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
return fn(*fn_args, **fn_kwargs)
File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_agent.py", line 781, in _train
processed_experience = self._preprocess(experience)
File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_agent.py", line 717, in _preprocess
value_preds, _ = self._collect_policy.apply_value_network(
File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_policy.py", line 214, in apply_value_network
return self._value_network(observations, step_types, value_state,
File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py", line 427, in __call__
outputs, new_state = super(Network, self).__call__(**normalized_kwargs) # pytype: disable=attribute-error # typed-keras
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/mask_splitter_network.py", line 115, in call
return self._wrapped_network(
File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py", line 427, in __call__
outputs, new_state = super(Network, self).__call__(**normalized_kwargs) # pytype: disable=attribute-error # typed-keras
TypeError: Exception encountered when calling layer "MaskSplitterNetwork" (type MaskSplitterNetwork).
call() got an unexpected keyword argument 'mask'
Call arguments received by layer "MaskSplitterNetwork" (type MaskSplitterNetwork):
• observation={'observation': 'tf.Tensor(shape=(30, None, 2, 4), dtype=int32)', 'valid_actions': 'tf.Tensor(shape=(30, None, 1296), dtype=bool)'}
• step_type=tf.Tensor(shape=(30, None), dtype=int32)
• network_state=()
• training=False
• kwargs=<class 'inspect._empty'>
In call to configurable 'train_eval' (<function train_eval at 0x7ff4c3f423a0>)
I proceeded with adding the arg mask=None
to the call
function in tf_agents.networks.value_network.ValueNetwork
, ending up with:
def call(self, observation, step_type=None, network_state=(), training=False, mask=None):
state, network_state = self._encoder(
observation, step_type=step_type, network_state=network_state,
training=training)
value = self._postprocessing_layers(state, training=training)
return tf.squeeze(value, -1), network_state
My agent is now able to learn and provides expected results. I'm not sure if this is a bug, or I am supposed to do this differently, but it's working.