baselines
baselines copied to clipboard
Question on mujoco reward
When training ppo2 using mujoco environment, I find that episode reward earned from infos['episode']['r'] doesn't equal to the sum of rewards of each step. In the Humanoid environment, summing up step reward only gets an episode reward of 10, while the true episode reward is about 300. Are there any problems with reward scaling?
looking forward to your response. thx.