MicroRTS-Py
MicroRTS-Py copied to clipboard
Reproduce Gridnet's SOTA agent with Trueskill Evaluation
Now that we are trying to get the self-play agent working, it's important to set baselines that we want to achieve and excel. Our best past experiment is this (which I just now realized Chris had run with --num-bot-envs 48
), which can achieve a Trueskill of 35.55
(source).
I am going to try reproduce with python ppo_gridnet.py --num-bot-envs 24 --num-selfplay-envs 0 --total-timesteps 100000000 --num-models 300
, see if we can get the same level of performance, so
python ppo_gridnet.py \
--num-bot-envs 24 --num-selfplay-envs 0 \
--total-timesteps 100000000 --num-models 300 \
--capture-video --prod-mode
After this, I am going to check if I can reproduce the same results with the new vecenv implementation in #34
Unfortunately, I wasn't able to reproduce the best past results in https://wandb.ai/costa-huang/gym-microrts/runs/17moy8qp. Maybe I need to run with the default parameters in https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/asrpz468 (--num-bot-envs 48)
Actually, I am going to try to use #34 to run the following. I'd rather not have to wait for 2 weeks again to reproduce the original results.
python ppo_gridnet.py \
--num-bot-envs 48 --num-selfplay-envs 0 \
--total-timesteps 300000000 --num-models 300 \
--capture-video --prod-mode
Turns out I don't have that much memory, so had to run with -num-bot-envs 24
python ppo_gridnet.py \
--num-bot-envs 24 --num-selfplay-envs 0 \
--total-timesteps 300000000 --num-models 300 \
--capture-video --prod-mode
https://wandb.ai/costa-huang/gym-microrts/reports/MicroRTSGridModeSharedMemVecEnv---VmlldzoxNDYwNDE0 tracks this progress
A new run https://wandb.ai/costa-huang/gym-microrts/runs/2v658xqx/logs?workspace=user-costa-huang seems successful, although the true skill evaluation is a bit buggy: see #41
Now try reproducing the same results with MicroRTSGridModeSharedMemVecEnv
from #34 in https://wandb.ai/gym-microrts/gym-microrts/runs/39stn3xh
Was able to reproduce same results with MicroRTSGridModeSharedMemVecEnv
.
Also, SPS is about 10% faster! If we could make the NN faster, SPS will be even faster.