agents icon indicating copy to clipboard operation
agents copied to clipboard

PPO policy with ActorDistributionNetwork and discrete action array

Open cedavidyang opened this issue 2 years ago • 13 comments

I'm using PPOAgent and ActorDistributionNetwork with the following action_spec:

action_spec = array_spec.BoundedArraySpec(
    shape=(10,), dtype=np.int32, minimum=0, maximum=4, name='action')

However, I received the following error when the agent was trying to initialize a PPOPolicy

ValueError: actor_network output spec does not match action spec

The issue arises when executing the following check in ppo_policy.py file (near line 112):

    distribution_utils.assert_specs_are_compatible(
        actor_output_spec, action_spec,
        'actor_network output spec does not match action spec')

The actor_output_spec of ActorDistributionNetwork is an event_spec with shape (), which is inconsistent with action_spec. A similar issue has been report in #548.

My workaround has been to comment out the specs compatibility block in the ppo_policy.py file. After doing that, the code can be run successfully, and the agent is able to learn. But I'm not sure if this is a bug, or am I missing anything?

cedavidyang avatar Sep 07 '21 23:09 cedavidyang

Thank you for reporting. It's a little hard to know exactly what's going on. Could you help print out both action_output_spec and action_spec so we know why it doesn't match?

summer-yue avatar Oct 07 '21 23:10 summer-yue

Thanks for following up. My action_spec is

BoundedTensorSpec(
    shape=(10,),
    dtype=tf.int32,
    name='action',
    minimum=array(0, dtype=int32),
    maximum=array(3, dtype=int32))

And the actor_output_spec is

<DistributionSpecV2: event_shape=(), dtype=<dtype: 'int32'>,
parameters=<Params: type=<class 'tensorflow_probability.python.distributions.categorical.Categorical'>,
params={'logits': TensorSpec(shape=(10, 4), dtype=tf.float32, name=None)}>>

I believe the error is because by default, Categorical distribution in TensorFlow Probability has an empty event_shape=(), which is not consistent with shape=(10,0) defined in action_spec

cedavidyang avatar Oct 08 '21 00:10 cedavidyang

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

summer-yue avatar Oct 08 '21 23:10 summer-yue

Hi, I received the same error when using ppoAgent,. I tried your code, but i still received the error. I tried that on two versions of tf-agents. version 1: tf-agents: 0.8.0 tensorflow: 2.5 python: 3.8

version 2: tf-agents: 0.9.0 tensorflow: 2.6 python: 3.7

Thanks for providing the addition information! I think you're right. I was able to reproduce your issue in a simple example in Colab. I'll follow up here with a more robust solution. Thanks for your patience.

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

lonaeyeo avatar Nov 14 '21 17:11 lonaeyeo

import numpy as np import tensorflow as tf from tf_agents.specs import array_spec, tensor_spec from tf_agents.trajectories import time_step as ts from tf_agents.agents.ppo import ppo_actor_network, ppo_clip_agent from tf_agents.networks import value_network from tf_agents.networks import actor_distribution_network

action_spec = array_spec.BoundedArraySpec(shape=(10, ), dtype=np.int32, minimum=0, maximum=4, name='action') observation_spec = array_spec.BoundedArraySpec(shape=(20,), dtype=np.int32, minimum=0, maximum=1, name='observation')

action_tensor_spec = tensor_spec.from_spec(action_spec) observation_tensor_spec = tensor_spec.from_spec(observation_spec) time_step_tensor_spec = ts.time_step_spec(observation_tensor_spec)

actor_net_builder = ppo_actor_network.PPOActorNetwork() actor_net = actor_distribution_network.ActorDistributionNetwork(observation_tensor_spec, action_tensor_spec)

value_net = value_network.ValueNetwork(observation_tensor_spec,fc_layer_params=(20, 20), kernel_initializer=tf.keras.initializers.Orthogonal())

agent = ppo_clip_agent.PPOClipAgent( time_step_tensor_spec, action_tensor_spec, actor_net=actor_net, value_net=value_net)

@summer-yue @lonaeyeo The above code results in the following error:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.int32, name=None)
vs.
BoundedTensorSpec(shape=(10,), dtype=tf.int32, name='action', minimum=array(0, dtype=int32), maximum=array(4, dtype=int32))

To make it work we need to change action_spec to be a scalar. action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.int32, minimum=0, maximum=4, name='action')

sibyjackgrove avatar Jan 20 '22 16:01 sibyjackgrove

Any updates on this? I would prefer to be able to use a discrete action space rather than being stuck to a scalar, and I've heard commenting out the check creates other problems

TheGreatRambler avatar Jan 21 '22 18:01 TheGreatRambler

@TheGreatRambler Perhaps I should have been more clear. You can use Discrete action space. What I meant was that the action has to be a single integer value and not a vector of integer values.

sibyjackgrove avatar Jan 21 '22 23:01 sibyjackgrove

Oh sorry, I'm a bit new to terminology. I need to be able to use a vector of integer values. I commented out the check and the agent appears to be running, but is it actually learning?

TheGreatRambler avatar Jan 23 '22 20:01 TheGreatRambler

The spec in question. The first 2 integers are joystick axis, the last 3 are booleans where 0 to 32767 is false and above is true.

self._action_spec = array_spec.BoundedArraySpec(shape=(5,), dtype=np.int32, minimum=0, maximum=65535, name='action')

TheGreatRambler avatar Jan 23 '22 21:01 TheGreatRambler

@TheGreatRambler I don't think the above spec will work if you have a mixture of booleans and integers. Joystick axis positions would be continuous actions. I think you will need two actor networks, one for the boolean values and one for the continuous integers.

sibyjackgrove avatar Jan 24 '22 21:01 sibyjackgrove

@cedavidyang Were you able to figure out a fix for this error?

sibyjackgrove avatar Mar 04 '22 03:03 sibyjackgrove

@summer-yue It seems the only way around this is to comment out the following from lines 11 to 13 in ppo_policy.py

distribution_utils.assert_specs_are_compatible(
actor_output_spec, action_spec,
'actor_network output spec does not match action spec')

sibyjackgrove avatar Oct 20 '22 20:10 sibyjackgrove

I encountered the same issue and had to comment out the lines above

aosama avatar Feb 06 '24 17:02 aosama