Multi-agent-reinforcement-learning icon indicating copy to clipboard operation
Multi-agent-reinforcement-learning copied to clipboard

regarding to policies and reward functions

Open zyzhang1130 opened this issue 4 years ago • 1 comments

Hi, May I ask how do you define more than one policy and reward function concurrently in a multi-agent setting? Thank you.

zyzhang1130 avatar Jan 24 '20 03:01 zyzhang1130

maddpg.py

def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch,
global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size)

    for i in range(self.num_agents):
        obs_batch_i = obs_batch[i]
        indiv_action_batch_i = indiv_action_batch[i]
        indiv_reward_batch_i = indiv_reward_batch[i]
        next_obs_batch_i = next_obs_batch[i]

        next_global_actions = []

        for agent in self.agents:
            next_obs_batch_i = torch.FloatTensor(next_obs_batch_i)
            indiv_next_action = agent.actor.forward(next_obs_batch_i)  # ??next_obs_batch[idx] replace next_obs_batch_i
           
        ##*******************  I think there should be:
        for idx, agent in enumerate(self.agents):
            indiv_next_action = agent.actor.forward(
                torch.tensor(next_obs_batch[idx], dtype=torch.float).to(agent.device))

yyds-xtt avatar Jul 17 '21 08:07 yyds-xtt