pymarl2 icon indicating copy to clipboard operation
pymarl2 copied to clipboard

Performance issue in DOP

Open KKGB opened this issue 5 months ago • 0 comments
trafficstars

Hello, I have a question about your code and paper result.

Currently, I am trying to reproduce the DOP algorithm using 3 random seeds (3, 4, 12). However, I noticed a few issues:

🔍 Issue

  1. In the 2s3z and MMM maps, the performance drops significantly after a certain point, suggesting possible policy collapse (see attached figures).
  2. In the 2c_vs_64zg map, which your paper reports achieving around 84% win rate, my model struggles to exceed 20~30% win rate consistently, even after sufficient training steps.

I am wondering if there might be something wrong with my hyperparameter settings. Below are the hyperparameters I used (shown below). For these settings, I referred to the values reported in your paper. Any help would be appreciated. Thanks!

Hyperparameter

action_selector: "multinomial"
epsilon_start: .5
epsilon_finish: .05
epsilon_anneal_time: 500000
mask_before_softmax: False

runner: "parallel"
mac: "dop_mac"

buffer_size: 32
off_buffer_size: 5000
batch_size_run: 8 #10
batch_size: 32
off_batch_size: 64
t_max: 10000000

env_args:
  state_last_action: False
target_update_interval: 600
step: 5

ent_coef: 0
lr: 0.0005
critic_lr: 0.0001
td_lambda: 0.8
tb_lambda: 0.93

# use qmix
mixing_embed_dim: 32

# use COMA
agent_output_type: "pi_logits"
learner: "offpg_learner"
critic_q_fn: "coma"
critic_baseline_fn: "coma"
critic_train_mode: "seq"
critic_train_reps: 1
q_nstep: 0  # 0 corresponds to default Q, 1 is r + gamma*Q, etc
grad_norm_clip: 20

name: "dop"

run: "dop_run"

KKGB avatar May 22 '25 07:05 KKGB