D4PG icon indicating copy to clipboard operation
D4PG copied to clipboard

How to set the Vmin and Vmax for the other mujoco tasks?

Open yingnan-rl opened this issue 5 years ago • 2 comments

Does it based on the cumulative reward?

yingnan-rl avatar Apr 11 '19 09:04 yingnan-rl

Yes it is based on the expected cumulative reward from the environment. For Pendulum, we can never get positive rewards so its easy to set Vmax to 0. To find Vmin its really a bit of trial and error as its simply another hyperparameter. The strategy I used to find this lower bound empirically is to run a random agent in the environment and log the reward at each state. You can then estimate the minimum value simply by calculating the cumulative future rewards from each state. This will give you a good lower bound on your value range as it is unlikely you will ever have an agent which performs worse than a random agent, as soon as you start training the values should increase. You could use this strategy to find bounds for the other environments like Mujoco.

msinto93 avatar Apr 29 '19 13:04 msinto93

Does it based on the cumulative reward?

Did you find the most appropriate parameters for mujoco tasks?

zienn avatar Apr 20 '20 17:04 zienn