dm_control
dm_control copied to clipboard
Reward is always 0 in Hopper
In Hopper, reward is always 0 independent of the action. The problem is that this code which is used in reward calculation in hopper.py get_reward(self, physics) function, always return 0 as the value of standing:
standing = rewards.tolerance(physics.height(), (_STAND_HEIGHT, 2))
When I check env._physics.height()
after environment reset, I saw that the height is initialized as a negative value always. Therefore it is impossible for this code fragment rewards.tolerance(physics.height(), (_STAND_HEIGHT, 2))
to return 1, since _STAND_HEIGHT is defined as 0.6 for Hopper.
In Hopper, reward is always 0 independent of the action. The problem is that this code which is used in reward calculation in hopper.py get_reward(self, physics) function, always return 0 as the value of standing:
standing = rewards.tolerance(physics.height(), (_STAND_HEIGHT, 2))
When I check
env._physics.height()
after environment reset, I saw that the height is initialized as a negative value always. Therefore it is impossible for this code fragmentrewards.tolerance(physics.height(), (_STAND_HEIGHT, 2))
to return 1, since _STAND_HEIGHT is defined as 0.6 for Hopper.
Hi! did you solve the problem? much appreciation