agile_flight icon indicating copy to clipboard operation
agile_flight copied to clipboard

The rule of collision penalty

Open tongtybj opened this issue 2 years ago • 7 comments

@yun-long

Hi, yunlong, thank you for developing this amazing repository. There is one thing that need your double check, which is the rule of collision penalty: https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300

I have confirmed the result of relative_dist would be either 1 or max_detection_range_ according to this coding.

I am not sure whether you code like this intentionally, or maybe following coding would be better?

Scalar relative_dist = (relative_pos_norm_[sort_idx] > 0) && (relative_pos_norm_[sort_idx] < max_detection_range_) ?\
      relative_pos_norm_[sort_idx]: max_detection_range_;

tongtybj avatar May 01 '22 11:05 tongtybj

hi,

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

If the actual relative dist is larger than the section range, relative dist will be clipped as the maximum detection range. otherwise, it is the same as the actual dist.

yun-long avatar May 02 '22 10:05 yun-long

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

OK. Then, you have to check https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300 grammerly. I mean the usage of ? : should be condition ? case1 : case2

tongtybj avatar May 02 '22 10:05 tongtybj

ohhhhhhhhhhh, you are absolutely right. sorry.

yun-long avatar May 02 '22 11:05 yun-long

thanks a lot @tongtybj

yun-long avatar May 02 '22 12:05 yun-long

@yun-long

You are welcome.

Actually, I also trained with the true std::exp(-1.0 * relative_dist) model, but got worse result. So I wondered that you wrote in this way intentioanlly.

tongtybj avatar May 02 '22 12:05 tongtybj

I didn't tune the reward. I am not surprised that the result is not good.

Some general suggestions are

  • tune the reward, check the learning curve not only for the total reward but also for each individual reward. Each individual reward component is logged. you can visualize the learning curve by cd ./saved and tensorboard --logdir=./
  • use different policy representations. Currently, the policy is represented via a multilayer perception, this is not a good representation for dynamic environments. Consider using a memory-based network, such as RNN/LSTM/GRU/TCN.

yun-long avatar May 02 '22 12:05 yun-long

Thanks a lot for your important advice!

tongtybj avatar May 02 '22 13:05 tongtybj