gym-pybullet-drones icon indicating copy to clipboard operation
gym-pybullet-drones copied to clipboard

Definition of reward function?

Open Gariscat opened this issue 3 years ago • 4 comments

Hi, Jacopo! I have been looking at the source code of the FlyThruGateAviary, but could not figure out how the reward function is defined for each step. Could you kindly elaborate a bit? Thanks a lot!!!

` def _computeReward(self): """Computes the current reward value.

    Returns
    -------
    float
        The reward.

    """
    state = self._getDroneStateVector(0)
    norm_ep_time = (self.step_counter/self.SIM_FREQ) / self.EPISODE_LEN_SEC
    return -10 * np.linalg.norm(np.array([0, -2*norm_ep_time, 0.75])-state[0:3])**2

`

Gariscat avatar Mar 25 '22 03:03 Gariscat

Hi @Gariscat

It has been a while since I wrote that but that line simply gives negative rewards proportional to the distance of the drone from a target position (x,y,z). The additional hack is that the y of the target shifts from 0 to -2 at the end of the episode. There are certainly many other ways to do it.

Choice of action/observation spaces, reward, and termination/reset conditions are part of the MDP formalization (that I left open for re-implementation in the way the code is structured because I think there is still a research consensus to be reached on what those should be in complex/unstable systems like the quadrotor). On the subject, I would recommend Elia Kaufmann's recent ICRA paper: https://arxiv.org/abs/2202.10796

JacopoPan avatar Mar 25 '22 14:03 JacopoPan

Understood! I might modify the reward a bit to better fit the MDP to my case. Thanks a lot for such a quick and elaborate response!

Gariscat avatar Apr 03 '22 04:04 Gariscat

You absolutely should! This project is really meant as an open-source, low-complexity simulation to bootstrap all type of quadcopter RL/learning-based control experiments, not any specific set of tasks.

JacopoPan avatar Apr 03 '22 14:04 JacopoPan

You absolutely should! This project is really meant as an open-source, low-complexity simulation to bootstrap all type of quadcopter RL/learning-based control experiments, not any specific set of tasks.

Hello, how can I stop the drone if it reaches the target?

zhengtiantian avatar Aug 02 '22 02:08 zhengtiantian