pysc2-examples
pysc2-examples copied to clipboard
It seems dqn can't learn much
I ran the script last night. It started with ~11 mean reward, and ended with ~15.5 mean reward.
I tried to play this mini-game myself, and I could get ~100 score or more.
Deepmind reached ~100 score in their video.
Kind of got a similar experience, but it actually dropped from 10 to 5 :D
Here is what the net learned on my laptop, marines are mostly hanging out at the bottom of the screen
Yeah, guys. I'm trying to enhance the score using the A3C algorithm. I'm re-writing the example codes.
If you have any improvement, please let me know! :)
I'm applying the A3C algorithm on it. This is the baseline agent of the paper. https://deepmind.com/documents/110/sc2le.pdf
Awesome! Will try it soon
@chris-chris How's going with A3C? I see you changed the principle, is it getting any better?
@Seraphli @ShadowDancer @vors @yilei
I applied A2C algorithm. I think it works better. you can train it with commands below.
python train_mineral_shards.py --algorithm=a2c --num_agents=2 --num_scripts=2 --timesteps=2000000
I tried to run the code, and at some point the program threw out this error.
Traceback (most recent call last):
File "train_mineral_shards.py", line 304, in <module>
main()
File "train_mineral_shards.py", line 183, in main
callback=a2c_callback)
File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 748, in learn
obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 621, in run
self.update_obs(obs)
File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 297, in update_obs
marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'
I'm having the same KeyError: '1'
issue as @Seraphli (output near identical to the above). Any idea where to look?
@davidkuhta @Seraphli I'll fix it! thanks!!
@davidkuhta @Seraphli I fixed it. Can you guys check it out?
Thanks @chris-chris! Running it now, will follow-up
Ok, still ran into the same issue, I did see the initialization in the last commit:
self.xy_per_marine = [{"0":[0,0], "1":[0,0]} for _ in range(nenv)]
I'm re-ran having added a print statement at 296 to output the self.xy_per_marine[env_num]
dict.
Here's how it ended:
...
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
rewards : [0 0 0 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
Game has started.
init group list
env 2 done! reward : 130.0 mean_100ep_reward : 84.7
rewards : [0 0 1 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'0': [11, 11]}
Traceback (most recent call last):
File "train_mineral_shards.py", line 302, in <module>
main()
File "train_mineral_shards.py", line 181, in main
callback=a2c_callback)
File "/home/AI/pysc2-examples/a2c/a2c.py", line 749, in learn
obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
File "/home/AI/pysc2-examples/a2c/a2c.py", line 622, in run
self.update_obs(obs)
File "/home/AI/pysc2-examples/a2c/a2c.py", line 298, in update_obs
marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'
Just in case anyone would like to look at a alternative work-in-progress implementation without openai-baselines dependency and complete action space: https://github.com/simonmeister/pysc2-rl-agents.
Is there anyone who has encountered this error?
TypeError: Can't instantiate abstract class SubprocVecEnv with abstract methods step_async, step_wait
@mushroom1116 I change the pysc2-examples/common/vec_env/subproc_vec_env.py
from baselines.common.vec_env import VecEnv
to
from . import VecEnv
and can run.