MicroRTS-Py icon indicating copy to clipboard operation
MicroRTS-Py copied to clipboard

Reproduce Gridnet's SOTA agent with Trueskill Evaluation

Open vwxyzjn opened this issue 3 years ago • 7 comments

Now that we are trying to get the self-play agent working, it's important to set baselines that we want to achieve and excel. Our best past experiment is this (which I just now realized Chris had run with --num-bot-envs 48), which can achieve a Trueskill of 35.55 (source).

image

I am going to try reproduce with python ppo_gridnet.py --num-bot-envs 24 --num-selfplay-envs 0 --total-timesteps 100000000 --num-models 300, see if we can get the same level of performance, so

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 100000000 --num-models 300 \
    --capture-video --prod-mode

After this, I am going to check if I can reproduce the same results with the new vecenv implementation in #34

vwxyzjn avatar Jan 18 '22 02:01 vwxyzjn

Unfortunately, I wasn't able to reproduce the best past results in https://wandb.ai/costa-huang/gym-microrts/runs/17moy8qp. Maybe I need to run with the default parameters in https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/asrpz468 (--num-bot-envs 48)

vwxyzjn avatar Jan 19 '22 15:01 vwxyzjn

Actually, I am going to try to use #34 to run the following. I'd rather not have to wait for 2 weeks again to reproduce the original results.

python ppo_gridnet.py \
    --num-bot-envs 48 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

Turns out I don't have that much memory, so had to run with -num-bot-envs 24

python ppo_gridnet.py \
    --num-bot-envs 24 --num-selfplay-envs 0 \
    --total-timesteps 300000000 --num-models 300 \
    --capture-video --prod-mode

vwxyzjn avatar Jan 19 '22 19:01 vwxyzjn

https://wandb.ai/costa-huang/gym-microrts/reports/MicroRTSGridModeSharedMemVecEnv---VmlldzoxNDYwNDE0 tracks this progress

vwxyzjn avatar Jan 19 '22 20:01 vwxyzjn

A new run https://wandb.ai/costa-huang/gym-microrts/runs/2v658xqx/logs?workspace=user-costa-huang seems successful, although the true skill evaluation is a bit buggy: see #41

vwxyzjn avatar Jan 22 '22 18:01 vwxyzjn

This run successfully reproduced past best results. Closing the issue now.

image

vwxyzjn avatar Jan 24 '22 16:01 vwxyzjn

Now try reproducing the same results with MicroRTSGridModeSharedMemVecEnv from #34 in https://wandb.ai/gym-microrts/gym-microrts/runs/39stn3xh

vwxyzjn avatar Jan 24 '22 20:01 vwxyzjn

Was able to reproduce same results with MicroRTSGridModeSharedMemVecEnv.

image

Also, SPS is about 10% faster! If we could make the NN faster, SPS will be even faster.

image

vwxyzjn avatar Jan 28 '22 21:01 vwxyzjn