HighwayEnv
HighwayEnv copied to clipboard
Multi Agent done and reward
Hello, I'm working on multi-agent and use this environment for test
one question: in multi-agent observation when calling env.step(action) I expect to receive a reward and a done for each agent like next_state array But this is not the case How can I solve this problem?
Yes, that is because by default, the agent rewards are aggregated in a single signal with a sum, in a cooperative fashion. See eg
https://github.com/eleurent/highway-env/blob/049888adea0537b8e2d1d52aa0ae5b5722610629/highway_env/envs/intersection_env.py#L68
This was done because it allows to use single-player RL algorithms from standard libraries in a multi-agent setting, as long as they support tuple observations and actions.
If you want to do proper multi-agent training where each agent optimises its own reward, you should replace this line by:
return tuple(self._agent_reward(action, vehicle) for vehicle in self.controlled_vehicles)
Maybe this should be the default, and cooperative aggregation should be enabled by a config, though.