agents icon indicating copy to clipboard operation
agents copied to clipboard

How to customize Actor policy in Actor-Learner setup

Open JLenssen opened this issue 2 years ago • 1 comments

Hi, I am following https://github.com/tensorflow/agents/tree/master/tf_agents/experimental/distributed/examples/sac to implement an Actor-Learner environment for DDQN (similar to Ape-X, page 6). How can I adapt the actor-policies that get their policy variables from the reverb variable container? I would like to use a different epsilon-greedy policy for each actor.

Actors are differentiated by FLAGS.task in the example. I tried to pass an Epsilon-Greedy Polciy to actor.Actor.policy

  collect_eps_policy = EpsilonGreedyPolicy(collect_policy, epsilon=1/FLAGS.task)
  env_step_metric = py_metrics.EnvironmentSteps()
  collect_actor = actor.Actor(
      collect_env,
      collect_eps_policy,
      train_step,
      steps_per_run=configs["collectors"]["num_steps_per_collect"],
      metrics=actor.collect_metrics(configs["collectors"]["num_steps_per_collect"]),
      summary_dir=summary_dir,
      observers=[rb_observer, env_step_metric])

but that resulted in the following error:

  File "/x/lib/python3.10/site-packages/tf_agents/policies/greedy_policy.py", line 58, in __init__
    emit_log_probability=policy.emit_log_probability,
  File "/x/python3.10/site-packages/tf_agents/policies/py_tf_eager_policy.py", line 246, in __getattr__
    return getattr(self._policy, name)
  AttributeError: '_UserObject' object has no attribute 'emit_log_probability'
  In call to configurable 'GreedyPolicy' (<class 'tf_agents.policies.greedy_policy.GreedyPolicy'>)
  In call to configurable 'EpsilonGreedyPolicy' (<class 'tf_agents.policies.epsilon_greedy_policy.EpsilonGreedyPolicy'>)
  In call to configurable 'collect' (<function collect at 0x2b33a2607490>)

Is there a simple way to implement what I want to do?

JLenssen avatar Oct 31 '22 21:10 JLenssen

The DdqnAgent initializer, as with the DqnAgent, accepts an epsilon_greedy argument. You can see in the code here that it uses an EpsilonGreedyPolicy for its collect policy, provided no Boltzmann temperature is provided:

agent = ddqn_agent.DdqnAgent(
...
epsilon_greedy=0.1)

agent.collect_policy # => EpsilonGreedyPolicy

Does your issue persist after making this change?

coreyleveen avatar Nov 29 '22 16:11 coreyleveen