agents icon indicating copy to clipboard operation
agents copied to clipboard

EnvironmentSteps tf_metric bug with parallel envs

Open vittorione94 opened this issue 1 year ago • 1 comments

I think there's an error when using the tensor flow EnvironmentSteps metric.

Let's say we're using parallel environment (with 10 envs) and setting collect_steps_per_iteration (to 5) in a DynamicStepDriver. I would expect the metric to return 50 after driver finished the run function, but it returns 10. To debug this easily, try an example file like ddpg and set these two parameters. However, it works fine if I'm using only one env (not parallel) it returns correctly 5.

Could anyone look into this? Or explain me if there's something wrong with my reasoning?

Best, -Vittorio

vittorione94 avatar Oct 25 '22 16:10 vittorione94

I believe the metric is only keeping track of train steps, rather than steps collected by the driver. This makes sense because if your initial collect driver ran, for example, 100,000 steps to partially fill your replay buffer, and then the EnvironmentSteps metric displayed 100,000 steps before the agent even began training, this could be misleading.

Perhaps try changing train_steps_per_iteration to 5 as well and see if that leads to the change in the metric value you expected.

coreyleveen avatar Nov 05 '22 02:11 coreyleveen