HighwayEnv icon indicating copy to clipboard operation
HighwayEnv copied to clipboard

Multi Agent done and reward

Open milad9005 opened this issue 2 years ago • 1 comments

Hello, I'm working on multi-agent and use this environment for test

one question: in multi-agent observation when calling env.step(action) I expect to receive a reward and a done for each agent like next_state array But this is not the case How can I solve this problem?

milad9005 avatar Mar 15 '22 14:03 milad9005

Yes, that is because by default, the agent rewards are aggregated in a single signal with a sum, in a cooperative fashion. See eg

https://github.com/eleurent/highway-env/blob/049888adea0537b8e2d1d52aa0ae5b5722610629/highway_env/envs/intersection_env.py#L68

This was done because it allows to use single-player RL algorithms from standard libraries in a multi-agent setting, as long as they support tuple observations and actions.

If you want to do proper multi-agent training where each agent optimises its own reward, you should replace this line by:

return tuple(self._agent_reward(action, vehicle) for vehicle in self.controlled_vehicles)

Maybe this should be the default, and cooperative aggregation should be enabled by a config, though.

eleurent avatar Mar 15 '22 21:03 eleurent