rocket-learn Improve gamemode weighting

This changes the gamemode weighting to be faster (no redis calls) and more stable so that it doesn't swing around so much. Each training step should have a very, very similar weight of gamemodes, that's very close to the desired weight, even if the training steps are very short (I've tested with 100k steps).

Sep 13 '22 20:09 Kaiyotech

You can use this algorithm to get more stable sampling probabilities over time:

Keep an estimate of the mean experience generated in each gamemode
Calculate empirical distribution weights: Wemp = mean_exp / sum(mean_exp)
Calculate corrected weights based on these estimates: Wcor = Wtarget / Wemp
Calculate corrected sampling probs: P = Wcor / sum(Wcor)

For step 1 you can use an EMA initialized based on agent count or anything other than 0, e.g. mean_exp = {'1v1': 1000, '2v2': 2000, '3v3': 3000}

Sep 18 '22 12:09 lucas-emery

What's the conclusion here?

Dec 06 '22 23:12 Rolv-Arild

Conclusion is I got busy and didn't finish it, but it's still on my list. I'm going to take the suggestions, just haven't finished yet.

On Tue, Dec 6, 2022, 6:17 PM Rolv-Arild @.***> wrote:

What's the conclusion here?

— Reply to this email directly, view it on GitHub https://github.com/Rolv-Arild/rocket-learn/pull/34#issuecomment-1340144259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWLB4KVNU7TODTBY6367GRLWL7CPPANCNFSM6AAAAAAQLZBVQ4 . You are receiving this because you authored the thread.Message ID: @.***>

Dec 06 '22 23:12 Kaiyotech

Ok, this is ready and tested. Uses the EMA for the weights, per worker. Generated experience is per actual, which means that if you're using pretrained agents or past models, those percentages naturally come out of the generated experience, which I think is ideal.

Jan 23 '23 14:01 Kaiyotech

added one commit for the 1v0 fixes that is related to this.

Feb 07 '23 15:02 Kaiyotech

rocket-learn rocket-learn copied to clipboard

Improve gamemode weighting

rocket-learn
rocket-learn copied to clipboard