on-policy icon indicating copy to clipboard operation
on-policy copied to clipboard

cannot reproduce the performance of MPE

Open hccz95 opened this issue 2 years ago • 2 comments

Hi, I have an issue when reproducing the performance of simple_spread in MPE.

The only modifications on your code:

  1. use --use_wandb to disable wandb in train_mpe.sh
  2. add self.envs.reset() before line 26 in mpe_runner.py

hccz95 avatar Jun 23 '22 15:06 hccz95

the reset code is written in the env_wrapper.py, so u don't need to add reset() in runner files.

so could u post an image about the hyper-parameters u used and the performance u achieved?

zoeyuchao avatar Sep 14 '22 11:09 zoeyuchao

Hi! I'm also confused about the outcomes of simple_spread. I modified use --use_wandb to disable wandb in train_mpe.sh like hccz95 and had the following reward curve: image It converges at around -120, but it should be around -110, according to the paper. Is it acceptable? Here are my configurations, I left all the parameters as default.: Train CUDA_VISIBLE_DEVICES=0 python train/train_mpe.py --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} --scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} --n_training_threads 1 --n_rollout_threads 128 --num_mini_batch 1 --episode_length 25 --num_env_steps 20000000 --ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --use_wandb Render CUDA_VISIBLE_DEVICES=0 python render/render_mpe.py --save_gifs --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} --scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} --n_training_threads 1 --n_rollout_threads 1 --use_render --episode_length 25 --render_episodes 5 --use_wandb --save_gifs --model_dir "path_to_model"

WaterHyacinthInNANHU avatar Oct 05 '22 09:10 WaterHyacinthInNANHU

The env steps is set to be 20M, and your results are only calculated at 6M, just run longer:)

zoeyuchao avatar Nov 04 '22 12:11 zoeyuchao