multiagent_mujoco Observations are mapped to each agent but what about each agent's actions?

Observations are mapped to each agent but what about each agent's actions?

Open PBarde opened this issue 3 years ago • 0 comments

I have trouble understanding where the list of action’s vector for each agent (that you pass to the MujocoMulti env ) is reassembled into the single agent Mujoco env action vector to match the correct actuators. For example, from line https://github.com/schroederdewitt/multiagent_mujoco/blob/97eab01fcff0313f1a1c275115c10616988145a3/src/multiagent_mujoco/mujoco_multi.py#L111

it seems that the multi-agent action list is simply flattened and then passed to the Mujoco single agent env. I do not see how this could handle both the 2-Agent Ant and 2-Agent Ant Diag setups. If we look at Figure 4 of the FACMAC paper, in Figure 4 H and I we have:

2-Agent Ant (Figure 4 H):

MA action list = [blue agent, green agent] = [[a1, a2, a5, a6], [a3, a4, a7, a8]]

Flattened single agent action = [a1, a2, a5, a6, a3, a4, a7, a8]

2-Agent Ant Diag (Figure 4 I):

MA action list = [blue agent, green agent] = [[a3, a4, a5, a6], [a1, a2, a7, a8]]

Flattened single agent action = [a3, a4, a5, a6, a1, a2, a7, a8]

We see that the action vectors passed to the single agent mujoco env do not correspond to the same actuators.

I think that this corresponds to agents observing one limb but controlling another.

Am I missing something here?

Feb 14 '22 14:02 PBarde

multiagent_mujoco multiagent_mujoco copied to clipboard

Observations are mapped to each agent but what about each agent's actions?

multiagent_mujoco
multiagent_mujoco copied to clipboard