agile_flight
agile_flight copied to clipboard
The rule of collision penalty
@yun-long
Hi, yunlong, thank you for developing this amazing repository. There is one thing that need your double check, which is the rule of collision penalty: https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300
I have confirmed the result of relative_dist
would be either 1
or max_detection_range_
according to this coding.
I am not sure whether you code like this intentionally, or maybe following coding would be better?
Scalar relative_dist = (relative_pos_norm_[sort_idx] > 0) && (relative_pos_norm_[sort_idx] < max_detection_range_) ?\
relative_pos_norm_[sort_idx]: max_detection_range_;
hi,
the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.
If the actual relative dist is larger than the section range, relative dist will be clipped as the maximum detection range. otherwise, it is the same as the actual dist.
the relative dist is a measurement of the euclidean distance between the drone center and the obstacle.
OK. Then, you have to check https://github.com/uzh-rpg/flightmare/blob/092ff357139b2e98fc92bcdee50f38f85b55246d/flightlib/src/envs/vision_env/vision_env.cpp#L296-L300 grammerly. I mean the usage of ? :
should be condition ? case1 : case2
ohhhhhhhhhhh, you are absolutely right. sorry.
thanks a lot @tongtybj
@yun-long
You are welcome.
Actually, I also trained with the true std::exp(-1.0 * relative_dist)
model, but got worse result. So I wondered that you wrote in this way intentioanlly.
I didn't tune the reward. I am not surprised that the result is not good.
Some general suggestions are
- tune the reward, check the learning curve not only for the total reward but also for each individual reward. Each individual reward component is logged. you can visualize the learning curve by
cd ./saved
andtensorboard --logdir=./
- use different policy representations. Currently, the policy is represented via a multilayer perception, this is not a good representation for dynamic environments. Consider using a memory-based network, such as RNN/LSTM/GRU/TCN.
Thanks a lot for your important advice!