pymarl2
pymarl2 copied to clipboard

Published 20 hours ago •

→

Metadata

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

Reame
Issues

Results 9 pymarl2 issues

Sort by recently updated

The latest smac_run_data.json

1

comment

Is it possible to have the latest smac_run_data.json? Found that the results provided by smac were out of date and only had 2 million

Running Trained Agents

Hi there, Is there a simple command to run already an already trained model, but without learning? I understand I can comment on the training in the code and remove...

Add NDQ algorithm

The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to...

运行pymarl2 Results中没有log.json

在运行下属指令后 `python src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor` 在results文件夹中没有pymarl中的log.json文件，请问在pymarl2中实验数据记录是存在哪里的？另外，在原pymarl库运行的过程中，使用defult.yaml和sc2.yaml中的configuration运行后，在log.json文件中，只记录了一个episode的数据，请问这是因为configuration设置不对吗？

TypeError: 'Boost.Python.class' object is not iterable

1

comment

``` Traceback (most recent call last): File "src/main.py", line 14, in from run import REGISTRY as run_REGISTRY File "/workspace/pymarl2/src/run/__init__.py", line 1, in from .run import run as default_run File "/workspace/pymarl2/src/run/run.py",...

关于q lambda的问题

作者您好，再q lambda的实现代码中，我想问一下如果episode是terminated的话，那么它的exp_return应该是零，但是terminated状态的reward是不是应该考虑进去？我不太理解这步的操作：reward = rewards[:, t] + exp_qvals[:, t] - qvals[:, t] #off-policy correction，有什么理论依据吗

Performance issue in DOP

Hello, I have a question about your code and paper result. Currently, I am trying to reproduce the DOP algorithm using 3 random seeds (3, 4, 12). However, I noticed...

MAPPO bug, trying to backward through the graph a second time.

The PPO/MAPPO code will crash if `mini_epochs` > 1. The reason is because some advantages are computed outside of the mini_epochs loop, that are reused inside of the loop; therefore,...

Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach?

Hello author, I have a question regarding your code. Why isn't the hidden state of the GRU detached from the computational graph? This could lead to exploding/vanishing gradients. I've seen...

About

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

reinforcement-learning

starcraft

smac

sota

marl

563

Stars

110

Forks

Watchers

Owner

← Metadata

563

Stars

110

Forks

Watchers

Owner

Metadata

Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)