on-policy icon indicating copy to clipboard operation
on-policy copied to clipboard

This is the official implementation of Multi-Agent PPO (MAPPO).

Results 25 on-policy issues
Sort by recently updated
recently updated
newest added

Why save only the latest model instead of the best-performing one? If I want to save the optimal model, what should be added?Thanks!!

When I tried to run ./train_mpe_spread.sh, I met the following issue: ``` obs_space: [Box(18,), Box(18,), Box(18,)] share_obs_space: [Box(54,), Box(54,), Box(54,)] act_space: [Discrete(5), Discrete(5), Discrete(5)] Traceback (most recent call last): File...

The detailed errors are as follows: Traceback (most recent call last): File "../train/train_smac.py", line 260, in main(sys.argv[1:]) File "../train/train_smac.py", line 138, in main "check recurrent policy!") AssertionError: check recurrent policy!...

Hi there Thanks for the great repository, one question. I see here https://github.com/marlbenchmark/on-policy/blob/4769caf56a9b2ccb90866ae56f1d9c804432e63b/onpolicy/scripts/train/train_hanabi_forward.py#L162 that there is supposed to be a `Runner` for Hanabi separated but I can't find the file,...

Thank you for your contribution to the RL community. I have some questions about the reply buffer setting in both shared and separated buffer settings. When I am training, I...

如题 在StarCraft2_Env.py中找到了save_replay()函数,看起来和SMAC源代码中的函数一样 但是我应该怎么用它? SMAC的官方代码readme中说的很模糊 请问您知道该怎么做吗? 多谢!

I run ./onpolicy/scripts/train_mpe_scripts/train_mpe_spread.sh after change 'algo' to mappo and user_name to my wandb user name in train_mpe_spread.sh. My train_mpe_spread.sh is as follows: ```text #!/bin/sh env="MPE" scenario="simple_spread" num_landmarks=3 num_agents=3 algo="mappo" #"rmappo"...

Dear authors, Thank you for this work! Could you please address a question that confuses me? I notice that the gfootball env terminates at a maximum of 400 steps as...

why the value of the available_actions is the array of (12800, 5) and the value of each row is [1, 1, 1, 1, 1]. 1 simply indicate whether or not...

Was running into the following error with the previous version: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and CPU!" Fixed...