on-policy
on-policy copied to clipboard
cannot reproduce the performance of MPE
Hi, I have an issue when reproducing the performance of simple_spread in MPE.
The only modifications on your code:
- use
--use_wandb
to disable wandb intrain_mpe.sh
- add
self.envs.reset()
before line 26 inmpe_runner.py
the reset code is written in the env_wrapper.py, so u don't need to add reset() in runner files.
so could u post an image about the hyper-parameters u used and the performance u achieved?
Hi! I'm also confused about the outcomes of simple_spread. I modified use --use_wandb to disable wandb in train_mpe.sh like hccz95 and had the following reward curve:
It converges at around -120, but it should be around -110, according to the paper. Is it acceptable?
Here are my configurations, I left all the parameters as default.:
Train
CUDA_VISIBLE_DEVICES=0 python train/train_mpe.py --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} --scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} --n_training_threads 1 --n_rollout_threads 128 --num_mini_batch 1 --episode_length 25 --num_env_steps 20000000 --ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --use_wandb
Render
CUDA_VISIBLE_DEVICES=0 python render/render_mpe.py --save_gifs --env_name ${env} --algorithm_name ${algo} --experiment_name ${exp} --scenario_name ${scenario} --num_agents ${num_agents} --num_landmarks ${num_landmarks} --seed ${seed} --n_training_threads 1 --n_rollout_threads 1 --use_render --episode_length 25 --render_episodes 5 --use_wandb --save_gifs --model_dir "path_to_model"
The env steps is set to be 20M, and your results are only calculated at 6M, just run longer:)