GibsonEnv icon indicating copy to clipboard operation
GibsonEnv copied to clipboard

train/enjoy_husky_gibson_flagrun.py issues!

Open Berk035 opened this issue 5 years ago • 4 comments

Hello everyone,

I have been studying about husky flagrun algorithms for a long time. I have some problems about it. Despite of trying everyhing, agent can not able to learn how to go to cube(target).

  • First of all, I couldn't understand the reward function which contains alive_score,progress and obstacle_dist only. There is no any close_to_target option to go target.

  • Second thing, The target location does not change in any file. There is only two line in _flagreposition as self.walk_to_target = ballxyz. It seems that not contribute to reward function and learning process.

  • The last thing, there is a sentence in the paper: "We trained a perceptual and non-perceptual husky agent according to the setting in Sec. 4.1 with PPO [78] for 150 episodes (300 iterations, 150k frames)." Is the true calculation 150k frame/300 Iteration = 500 Timesteps*Batch ? Timesteps and batch multiplication seems too low.

If I took answers to questions, I would be grateful to you. Thanks.

Berk035 avatar Dec 10 '19 06:12 Berk035

  • Reward: alive_score is a reward function to prevent agent from tipping over; progress is the difference of the potential function for two consecutive timesteps (dense reward); obstacle distance penalize going too close to an obstacle.

  • The target location is changed in _flag_reposition(), in that function a random force is applied to the red cube and throws it within the room, in this way the target location is changed.

  • The policy is able to converge with a small number of environment steps because it receives ground truth localization, i.e. the agent knows where the target is and only needs to perform local planning/obstacle avoidance.

fxia22 avatar Dec 10 '19 06:12 fxia22

Can you plot your reward curve during your training process? This would be insightful! Thanks.

fxia22 avatar Dec 10 '19 06:12 fxia22

Can you plot your reward curve during your training process? This would be insightful! Thanks.

Thank you for your quick response Fei. You are awesome :) I know that the rewards but according to the enjoy results, the agent couldn't go the target. I tried also training with adding self.robot.set_target_position(ball_xyz). Anyway, I will plot my results a few minutes later. Thank you.

Berk035 avatar Dec 10 '19 06:12 Berk035

Figure_1

Timesteps:600, Episode:20, Iterations:250

Berk035 avatar Dec 10 '19 06:12 Berk035