agents
agents copied to clipboard
Loaded policy eval runs 4 times faster than original policy eval
I have made RL solution largely based on: https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial
After training finished I run eval once. Then I save the policy and run eval again. The timing is 400% different. Is this expected? Is there a reasonable explanation?
This is original eval:
start = timer()
avg_return, total_rewards = policy_eval(eval_env, agent.policy, total_eval_episodes)
end = timer()
print('{2} | steps = {0:6}: Average Return = {1:<+9e}, per step: {3}'.format(total_eval_episodes, avg_return, timedelta(seconds=end-start), timedelta(seconds=(end-start)/total_eval_episodes)))
Out:
0:25:27.958856 | steps = 1000: Average Return = -4.910102e-07, per step: 0:00:01.527959
this is save/load:
tf_policy_saver = policy_saver.PolicySaver(agent.policy)
tf_policy_saver.save(policy_dir)
. . .
saved_policy = tf.saved_model.load(policy_dir)
Eval using loaded policy:
start = timer()
avg_return2, total_rewards2 = policy_eval(eval_env, saved_policy, total_eval_episodes)
end = timer()
print('{2} | Saved policy: steps = {0:6}: Average Return = {1:<+9e}, per step: {3}'.format(total_eval_episodes, avg_return2, timedelta(seconds=end-start), timedelta(seconds=(end-start)/total_eval_episodes)))
Out:
0:03:47.331221 | Saved policy: steps = 1000: Average Return = -7.780847e-07, per step: 0:00:00.227331
eval function (almost same as in the tutorial:
def policy_eval(environment, policy, num_episodes=10):
total_return = 0.0
episode_returns = []
policy_state = policy.get_initial_state(environment.batch_size)
for _ in range(num_episodes):
time_step = environment.reset()
while not time_step.is_last():
action_step = policy.action(time_step, policy_state)
policy_state = action_step.state
time_step = environment.step(action_step.action)
total_return += time_step.reward
episode_returns.append(time_step.reward)
avg_return = total_return / num_episodes
return avg_return.numpy()[0], episode_returns