on-policy icon indicating copy to clipboard operation
on-policy copied to clipboard

This is the official implementation of Multi-Agent PPO (MAPPO).

Results 25 on-policy issues
Sort by recently updated
recently updated
newest added

Hi, I have an issue when reproducing the performance of simple_spread in MPE. The only modifications on your code: 1. use `--use_wandb` to disable wandb in `train_mpe.sh` 2. add `self.envs.reset()`...

the function huber_loss in utils is like: ``` def huber_loss(e, d): a = (abs(e) d).float() return a*e**2/2 + b*d*(abs(e)-d/2) ``` It may come with a zero loss when error is...

https://github.com/marlbenchmark/on-policy/blob/0483adc4b55233c649eece2458fe6fba367d26d9/onpolicy/envs/starcraft2/StarCraft2_Env.py#L560-L577 It should be bad_transition = True in line 563.

In the config.py file, there is this env parameter: ``` parser.add_argument("--use_obs_instead_of_state", action='store_true', default=False, help="Whether to use global state or concatenated obs") ``` I would like to use a global state...

The output of python is listed here: Traceback (most recent call last): File "train/train_football.py", line 203, in main(sys.argv[1:]) File "train/train_football.py", line 188, in main runner.run() File "/onpolicy/runner/shared/football_runner.py", line 43, in...

Hello, Thanks for open-sourcing a really good work. I was wondering if you guys can open-source the MASAC code base as it would help to understand the variations of MASAC...

您好!如果agent的动作维度不一致时,mappo如何进行action mask?

Hi, I find something odd and I'd like to know if there's something I'm missing or if it's normal. In the buffers, you define the action_log_probs to have "act_shape" as...