gym-pybullet-drones
gym-pybullet-drones copied to clipboard
Definition of reward function?
Hi, Jacopo! I have been looking at the source code of the FlyThruGateAviary, but could not figure out how the reward function is defined for each step. Could you kindly elaborate a bit? Thanks a lot!!!
` def _computeReward(self): """Computes the current reward value.
Returns
-------
float
The reward.
"""
state = self._getDroneStateVector(0)
norm_ep_time = (self.step_counter/self.SIM_FREQ) / self.EPISODE_LEN_SEC
return -10 * np.linalg.norm(np.array([0, -2*norm_ep_time, 0.75])-state[0:3])**2
`
Hi @Gariscat
It has been a while since I wrote that but that line simply gives negative rewards proportional to the distance of the drone from a target position (x,y,z). The additional hack is that the y of the target shifts from 0 to -2 at the end of the episode. There are certainly many other ways to do it.
Choice of action/observation spaces, reward, and termination/reset conditions are part of the MDP formalization (that I left open for re-implementation in the way the code is structured because I think there is still a research consensus to be reached on what those should be in complex/unstable systems like the quadrotor). On the subject, I would recommend Elia Kaufmann's recent ICRA paper: https://arxiv.org/abs/2202.10796
Understood! I might modify the reward a bit to better fit the MDP to my case. Thanks a lot for such a quick and elaborate response!
You absolutely should! This project is really meant as an open-source, low-complexity simulation to bootstrap all type of quadcopter RL/learning-based control experiments, not any specific set of tasks.
You absolutely should! This project is really meant as an open-source, low-complexity simulation to bootstrap all type of quadcopter RL/learning-based control experiments, not any specific set of tasks.
Hello, how can I stop the drone if it reaches the target?