rocket-learn
rocket-learn copied to clipboard
Improve gamemode weighting
This changes the gamemode weighting to be faster (no redis calls) and more stable so that it doesn't swing around so much. Each training step should have a very, very similar weight of gamemodes, that's very close to the desired weight, even if the training steps are very short (I've tested with 100k steps).
You can use this algorithm to get more stable sampling probabilities over time:
- Keep an estimate of the mean experience generated in each gamemode
- Calculate empirical distribution weights:
Wemp = mean_exp / sum(mean_exp)
- Calculate corrected weights based on these estimates:
Wcor = Wtarget / Wemp
- Calculate corrected sampling probs:
P = Wcor / sum(Wcor)
For step 1 you can use an EMA initialized based on agent count or anything other than 0, e.g. mean_exp = {'1v1': 1000, '2v2': 2000, '3v3': 3000}
What's the conclusion here?
Conclusion is I got busy and didn't finish it, but it's still on my list. I'm going to take the suggestions, just haven't finished yet.
On Tue, Dec 6, 2022, 6:17 PM Rolv-Arild @.***> wrote:
What's the conclusion here?
— Reply to this email directly, view it on GitHub https://github.com/Rolv-Arild/rocket-learn/pull/34#issuecomment-1340144259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWLB4KVNU7TODTBY6367GRLWL7CPPANCNFSM6AAAAAAQLZBVQ4 . You are receiving this because you authored the thread.Message ID: @.***>
Ok, this is ready and tested. Uses the EMA for the weights, per worker. Generated experience is per actual, which means that if you're using pretrained agents or past models, those percentages naturally come out of the generated experience, which I think is ideal.
added one commit for the 1v0 fixes that is related to this.