HighwayEnv icon indicating copy to clipboard operation
HighwayEnv copied to clipboard

collect three types of rewards during testing

Open zijianh4 opened this issue 2 years ago • 3 comments

Hi, I am now working on a project and doing experiments on the highway-fast-v0 environment. I wonder if there is a way to get the three types of rewards in each step during testing time directly, or I should rewrite the library code by myself? Thx!

zijianh4 avatar Jan 09 '22 11:01 zijianh4

Hi @zijianh4, Currently, the several types of rewards (I imagine you mean like safety, efficiency, comfort, etc) are typically summed into a single scalar reward, that is optimised by the agent. If you want to keep track of these separate terms, a possibility is to add them to the info field defined in the open gym interface, such that you can access them this way:

>>> obs, reward, done, info = env.step(action)
>>>info
 {
  "speed": 15.0,
  "crashed": False,
  "acceleration: 1.1
}

for example.

That requires changing the code to add these additional info fields, yes. They are currently a few written by default, see here: https://github.com/eleurent/highway-env/blob/9d63973da854584fe51b00ccee7b24b1bf031418/highway_env/envs/common/abstract.py#L150

You can add other fields directly there, or for anything env-specific you should rather override the _info() method in your environment.

eleurent avatar Jan 12 '22 15:01 eleurent

Hi @zijianh4, Currently, the several types of rewards (I imagine you mean like safety, efficiency, comfort, etc) are typically summed into a single scalar reward, that is optimised by the agent. If you want to keep track of these separate terms, a possibility is to add them to the info field defined in the open gym interface, such that you can access them this way:

>>> obs, reward, done, info = env.step(action)
>>>info
 {
  "speed": 15.0,
  "crashed": False,
  "acceleration: 1.1
}

for example.

That requires changing the code to add these additional info fields, yes. They are currently a few written by default, see here:

https://github.com/eleurent/highway-env/blob/9d63973da854584fe51b00ccee7b24b1bf031418/highway_env/envs/common/abstract.py#L150

You can add other fields directly there, or for anything env-specific you should rather override the _info() method in your environment.

Hi @eleurent, Thanks for your reply. I override the _info() function in my testing code like this but it seems not to work. Specifically, I define function before if __name__=='__main__': like this:

from highway_env.envs.common.action import action_factory, Action, DiscreteMetaAction, ActionType
Observation = np.ndarray
from highway_env.vehicle.controller import ControlledVehicle
from highway_env import utils

def _info_sep_re(self, obs: Observation, action: Action) -> dict:
    """
    Return a dictionary of additional information
    :param obs: current observation
    :param action: current action
    :return: info dict
    """
    neighbours = self.road.network.all_side_lanes(self.vehicle.lane_index)
    lane = self.vehicle.target_lane_index[2] if isinstance(self.vehicle, ControlledVehicle) \
        else self.vehicle.lane_index[2]
    scaled_speed = utils.lmap(self.vehicle.speed, self.config["reward_speed_range"], [0, 1])
    collision_reward = self.config["collision_reward"] * self.vehicle.crashed
    right_lane_reward = self.config["right_lane_reward"] * lane / max(len(neighbours) - 1, 1)
    high_speed_reward = self.config["high_speed_reward"] * np.clip(scaled_speed, 0, 1)
    info = {
        "speed": self.vehicle.speed,
        "crashed": self.vehicle.crashed,
        "action": action,
        "collision_reward": collision_reward,
        "right_lane_reward": right_lane_reward,
        "high_speed_reward": high_speed_reward,
    }
    try:
        info["cost"] = self._cost(action)
    except NotImplementedError:
        pass
    return 

and then I set highway_env._info = _info_sep_re in if __name__=='__main__. There is no error when override the function but when I try to log "collision_reward", "right_lane_reward" and "high_speed_reward", it doesn't work and there is no key in the dictionary of info. Could you please help me in this issue? Thx!

zijianh4 avatar Jan 12 '22 18:01 zijianh4

Shouldn't you just replace return by return info?

eleurent avatar Jan 13 '22 13:01 eleurent