Gym.jl
Gym.jl copied to clipboard
PendulumEnv does not use clamped torque
PendulumEnv is calculating clamped torque, but using unclamped torque in subsequent calculations. i.e. we calculate:
v = clamp.(u, -env.max_torque, env.max_torque)
but we don't use v in any of the following lines, and we use u directly.
Thanks for pointing out. I tried changing it and running on the examples from model-zoo, and that has difficulty in learning. The gradients vanish due to use of clamp, maybe that's the reason v was never used. I noticed that without using v, the model still learns to output the values in the given range of torque. I'm experimenting with workarounds to get it working with v.