muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

Optimization of some parameters for tictactoe.

Open AdrianAcala opened this issue 3 years ago • 4 comments

Here's some optimizations of for tictactoe. I had 12 threads on my CPU so I had 12 workers, but otherwise I tested it multiple times and provided great results.

CC: @theword / @JohnPPP

First run image

Second run image

AdrianAcala avatar Aug 05 '21 03:08 AdrianAcala

Let me know if you want me to also include the best model I can create.

AdrianAcala avatar Aug 06 '21 02:08 AdrianAcala

Hi Adrian,

Thanks!

Are your plots showing the results against the expert opponent or against the random opponent? It looks really good.

The initial learning rate might be a bit high, I got sometimes some NaN at the beginning but when it avoids the NaN during the first 500 games, it seems pretty robust in its progress then.

Maybe adding an option to pre-fill the replay buffer with random games should help avoiding these NaNs at the start.

If you have not too bad weights it would be great to add them to the repo too!

werner-duvaud avatar Aug 06 '21 03:08 werner-duvaud

@werner-duvaud , the results are against the expert opponent.

Yeah. The learning rate is quite high and I tried several times to test to see if I ran into NaN which I didn't, but if you did, then we'll need to test a bit. One of the articles I read, mentioned having a periodic learning rate was helpful. In this case, it would also make a lot of sense. Maybe I can incorporate that as well as a feature flag.

My weights were really good. One time I tried playing it and the game ended abruptly. At first, I thought there was a bug, then I realized I lost. 😆

I'll train it for longer to see what we get.

AdrianAcala avatar Aug 06 '21 04:08 AdrianAcala

@AdrianAcala How much ram did you have?

isaiah2004 avatar Feb 04 '24 20:02 isaiah2004