pymarl2 icon indicating copy to clipboard operation
pymarl2 copied to clipboard

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

Results 9 pymarl2 issues
Sort by recently updated
recently updated
newest added

Is it possible to have the latest smac_run_data.json? Found that the results provided by smac were out of date and only had 2 million

Hi there, Is there a simple command to run already an already trained model, but without learning? I understand I can comment on the training in the code and remove...

The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to...

在运行下属指令后 `python src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor` 在results文件夹中没有pymarl中的log.json文件,请问在pymarl2中实验数据记录是存在哪里的? 另外,在原pymarl库运行的过程中,使用defult.yaml和sc2.yaml中的configuration运行后,在log.json文件中,只记录了一个episode的数据,请问这是因为configuration设置不对吗?

``` Traceback (most recent call last): File "src/main.py", line 14, in from run import REGISTRY as run_REGISTRY File "/workspace/pymarl2/src/run/__init__.py", line 1, in from .run import run as default_run File "/workspace/pymarl2/src/run/run.py",...

作者您好,再q lambda的实现代码中,我想问一下如果episode是terminated的话,那么它的exp_return应该是零,但是terminated状态的reward是不是应该考虑进去? 我不太理解这步的操作:reward = rewards[:, t] + exp_qvals[:, t] - qvals[:, t] #off-policy correction,有什么理论依据吗

Hello, I have a question about your code and paper result. Currently, I am trying to reproduce the DOP algorithm using 3 random seeds (3, 4, 12). However, I noticed...

The PPO/MAPPO code will crash if `mini_epochs` > 1. The reason is because some advantages are computed outside of the mini_epochs loop, that are reused inside of the loop; therefore,...

Hello author, I have a question regarding your code. Why isn't the hidden state of the GRU detached from the computational graph? This could lead to exploding/vanishing gradients. I've seen...