agents
agents copied to clipboard
How to customize Actor policy in Actor-Learner setup
Hi, I am following https://github.com/tensorflow/agents/tree/master/tf_agents/experimental/distributed/examples/sac to implement an Actor-Learner environment for DDQN (similar to Ape-X, page 6). How can I adapt the actor-policies that get their policy variables from the reverb variable container? I would like to use a different epsilon-greedy policy for each actor.
Actors are differentiated by FLAGS.task
in the example. I tried to pass an Epsilon-Greedy Polciy to actor.Actor.policy
collect_eps_policy = EpsilonGreedyPolicy(collect_policy, epsilon=1/FLAGS.task)
env_step_metric = py_metrics.EnvironmentSteps()
collect_actor = actor.Actor(
collect_env,
collect_eps_policy,
train_step,
steps_per_run=configs["collectors"]["num_steps_per_collect"],
metrics=actor.collect_metrics(configs["collectors"]["num_steps_per_collect"]),
summary_dir=summary_dir,
observers=[rb_observer, env_step_metric])
but that resulted in the following error:
File "/x/lib/python3.10/site-packages/tf_agents/policies/greedy_policy.py", line 58, in __init__
emit_log_probability=policy.emit_log_probability,
File "/x/python3.10/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 246, in __getattr__
return getattr(self._policy, name)
AttributeError: '_UserObject' object has no attribute 'emit_log_probability'
In call to configurable 'GreedyPolicy' (<class 'tf_agents.policies.greedy_policy.GreedyPolicy'>)
In call to configurable 'EpsilonGreedyPolicy' (<class 'tf_agents.policies.epsilon_greedy_policy.EpsilonGreedyPolicy'>)
In call to configurable 'collect' (<function collect at 0x2b33a2607490>)
Is there a simple way to implement what I want to do?
The DdqnAgent
initializer, as with the DqnAgent
, accepts an epsilon_greedy
argument. You can see in the code here that it uses an EpsilonGreedyPolicy
for its collect policy, provided no Boltzmann temperature is provided:
agent = ddqn_agent.DdqnAgent(
...
epsilon_greedy=0.1)
agent.collect_policy # => EpsilonGreedyPolicy
Does your issue persist after making this change?