agile_flight The rule of collision penalty

@yun-long

Hi, yunlong, thank you for developing this amazing repository. There is one thing that need your double check, which is the rule of collision penalty: https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300

I have confirmed the result of relative_dist would be either 1 or max_detection_range_ according to this coding.

I am not sure whether you code like this intentionally, or maybe following coding would be better?

Scalar relative_dist = (relative_pos_norm_[sort_idx] > 0) && (relative_pos_norm_[sort_idx] < max_detection_range_) ?\
      relative_pos_norm_[sort_idx]: max_detection_range_;

May 01 '22 11:05 tongtybj

hi,

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

If the actual relative dist is larger than the section range, relative dist will be clipped as the maximum detection range. otherwise, it is the same as the actual dist.

May 02 '22 10:05 yun-long

the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.

OK. Then, you have to check https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300 grammerly. I mean the usage of ? : should be condition ? case1 : case2

May 02 '22 10:05 tongtybj

ohhhhhhhhhhh, you are absolutely right. sorry.

May 02 '22 11:05 yun-long

thanks a lot @tongtybj

May 02 '22 12:05 yun-long

@yun-long

You are welcome.

Actually, I also trained with the true std::exp(-1.0 * relative_dist) model, but got worse result. So I wondered that you wrote in this way intentioanlly.

May 02 '22 12:05 tongtybj

I didn't tune the reward. I am not surprised that the result is not good.

Some general suggestions are

tune the reward, check the learning curve not only for the total reward but also for each individual reward. Each individual reward component is logged. you can visualize the learning curve by cd ./saved and tensorboard --logdir=./
use different policy representations. Currently, the policy is represented via a multilayer perception, this is not a good representation for dynamic environments. Consider using a memory-based network, such as RNN/LSTM/GRU/TCN.

May 02 '22 12:05 yun-long

Thanks a lot for your important advice!

May 02 '22 13:05 tongtybj

agile_flight agile_flight copied to clipboard

The rule of collision penalty

agile_flight
agile_flight copied to clipboard