MicroRTS-Py
MicroRTS-Py copied to clipboard
Faster Convergence
Training an agent now still takes a long time. The particular experiment in #36 took 4d 9h 11m 14s to finish.
Looking at the reward chart, it appears the agent could achieve 70% of the final performance in just 50M steps (or about 10 hours into training)
We should try to optimize based on the 10 hours time computational budget.
The bottleneck I think is still largely on the NN side. So one thing worth trying is to reduce the NN size.
Alternatively, I noticed the learning rate annealing, in the end, seems to really help the algorithm converge. So maybe we could also try using a smaller learning rate and just turn off annealing.
Maybe we could tune with the discount factor (we should also visualize the discounted returns (what the agent actually optimized for).
#56 tries to address this issue.