pymarl Inconsistent between code and pseudocode in agent input

Inconsistent between code and pseudocode in agent input

Open Ynjxsjmh opened this issue 3 years ago • 2 comments

Reading the pseudocode in paper Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

The inputs of agent network is τᵃₜ and uᵃₜ. According to the pseudocode, τ is a list of (oₜ, uₜ₋₁). τᵃ and uᵃ are introduced as following in the paper

At each time step, each agent a ∈ A ≡ {1,...,n} chooses an action uᵃ ∈ U. Each agent has an action-observation history τᵃ ∈ T ≡ (Z×U)*.

However, in the pymarl code, the inputs of agent network seems not τ and u but o and u:

https://github.com/oxwhirl/pymarl/blob/73960e11c5a72e7f9c492d36dbfde02016fde05a/src/controllers/basic_controller.py#L77-92

    def _build_inputs(self, batch, t):
        # Assumes homogenous agents with flat observations.
        # Other MACs might want to e.g. delegate building inputs to each agent
        bs = batch.batch_size
        inputs = []
        inputs.append(batch["obs"][:, t])  # b1av
        if self.args.obs_last_action:
            if t == 0:
                inputs.append(th.zeros_like(batch["actions_onehot"][:, t]))
            else:
                inputs.append(batch["actions_onehot"][:, t-1])
        if self.args.obs_agent_id:
            inputs.append(th.eye(self.n_agents, device=batch.device).unsqueeze(0).expand(bs, -1, -1))

        inputs = th.cat([x.reshape(bs*self.n_agents, -1) for x in inputs], dim=1)
        return inputs

In your implementation, inputs is constructed with batch["obs"][:, t] and batch["actions_onehot"][:, t-1] rather than action-observation history and action.

Aug 02 '21 01:08 Ynjxsjmh

the action-observation history is encoded by RNN. We also recommend our finetuned qmix: https://github.com/hijkzzz/pymarl2.

Aug 06 '21 22:08 hijkzzz

As far as I know, _build_inputs() is only used in forward() method in which DRQN model is used.

https://github.com/oxwhirl/pymarl/blob/73960e11c5a72e7f9c492d36dbfde02016fde05a/src/controllers/basic_controller.py#L26-L29

You say "the action-observation history is encoded by RNN.", but I didn't see anything related in agent.forward() method.

https://github.com/oxwhirl/pymarl/blob/73960e11c5a72e7f9c492d36dbfde02016fde05a/src/modules/agents/rnn_agent.py#L18-L23

Aug 07 '21 01:08 Ynjxsjmh

pymarl pymarl copied to clipboard

Inconsistent between code and pseudocode in agent input

pymarl
pymarl copied to clipboard