agents PPO Agent with masked actions

PPO Agent with masked actions

Open beytuuh42 opened this issue 1 year ago • 0 comments

Hello,

when I'm using PPOAgent with masked actions, I wrap the actor network with MaskSplitterNetwork and the value network as well. However, when trying to train the agent, I get this error message:

  File "/tf/ppo_example/ppo.py", line 335, in <module>
    multiprocessing.handle_main(functools.partial(app.run, main))
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/system/default/multiprocessing_core.py", line 77, in handle_main
    return app.run(parent_main_fn, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/tf/WORKING DQN/ppo.py", line 322, in main
    train_eval(
  File "/usr/local/lib/python3.8/dist-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/usr/local/lib/python3.8/dist-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/usr/local/lib/python3.8/dist-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/tf/WORKING DQN/ppo.py", line 275, in train_eval
    total_loss, _ = train_step()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tf/WORKING DQN/ppo.py", line 245, in train_step
    return tf_agent.train(experience=trajectories)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 336, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_agent.py", line 781, in _train
    processed_experience = self._preprocess(experience)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_agent.py", line 717, in _preprocess
    value_preds, _ = self._collect_policy.apply_value_network(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/ppo/ppo_policy.py", line 214, in apply_value_network
    return self._value_network(observations, step_types, value_state,
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py", line 427, in __call__
    outputs, new_state = super(Network, self).__call__(**normalized_kwargs)  # pytype: disable=attribute-error  # typed-keras
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/mask_splitter_network.py", line 115, in call
    return self._wrapped_network(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/networks/network.py", line 427, in __call__
    outputs, new_state = super(Network, self).__call__(**normalized_kwargs)  # pytype: disable=attribute-error  # typed-keras
TypeError: Exception encountered when calling layer "MaskSplitterNetwork" (type MaskSplitterNetwork).

call() got an unexpected keyword argument 'mask'

Call arguments received by layer "MaskSplitterNetwork" (type MaskSplitterNetwork):
  • observation={'observation': 'tf.Tensor(shape=(30, None, 2, 4), dtype=int32)', 'valid_actions': 'tf.Tensor(shape=(30, None, 1296), dtype=bool)'}
  • step_type=tf.Tensor(shape=(30, None), dtype=int32)
  • network_state=()
  • training=False
  • kwargs=<class 'inspect._empty'>
  In call to configurable 'train_eval' (<function train_eval at 0x7ff4c3f423a0>)

I proceeded with adding the arg mask=None to the call function in tf_agents.networks.value_network.ValueNetwork, ending up with:

def call(self, observation, step_type=None, network_state=(), training=False, mask=None):
  state, network_state = self._encoder(
      observation, step_type=step_type, network_state=network_state,
      training=training)
  value = self._postprocessing_layers(state, training=training)
  return tf.squeeze(value, -1), network_state

My agent is now able to learn and provides expected results. I'm not sure if this is a bug, or I am supposed to do this differently, but it's working.

Jul 19 '22 14:07 beytuuh42

agents agents copied to clipboard

PPO Agent with masked actions

agents
agents copied to clipboard