Multi-agent-reinforcement-learning icon indicating copy to clipboard operation
Multi-agent-reinforcement-learning copied to clipboard

Implementation of Multi-Agent Reinforcement Learning algorithm(s). Currently includes: MADDPG

Results 7 Multi-agent-reinforcement-learning issues
Sort by recently updated
recently updated
newest added

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

maddpg.py def update(self, batch_size): obs_batch, indiv_action_batch, indiv_reward_batch, next_obs_batch, global_state_batch, global_actions_batch, global_next_state_batch, done_batch = self.replay_buffer.sample(batch_size) for i in range(self.num_agents): obs_batch_i = obs_batch[i] indiv_action_batch_i = indiv_action_batch[i] indiv_reward_batch_i = indiv_reward_batch[i] next_obs_batch_i = next_obs_batch[i]...

Hi, May I ask how do you define more than one policy and reward function concurrently in a multi-agent setting? Thank you.