Sacha Chernyavskiy comments

Results 61 comments of


                                            Sacha Chernyavskiy

Alpha zero refactor (testing and polishing)

Yes, exacltly. 1) `flax.nnx` is not fully working, there are some minor fixes (e.g. conv dimension calculation and tests) left 2) the implementation does work, however, there happen some problems...

Alpha zero refactor (testing and polishing)

Surely, we will notify you!

Alpha zero refactor (testing and polishing)

@lanctot, for both APIs: `linen` and `nnx` there are now tests, and they're passing. The only minor things left from the development side are ~~model export and the changelog~~ benchmarks....

Alpha zero refactor (testing and polishing)

@lanctot , I ran for much longer time a TTT experiment ![telegram-cloud-photo-size-2-5301266743655791235-y](https://github.com/user-attachments/assets/764cf3fb-0248-4ea9-b65c-64b0911d0224) Doesn't look good, does it? Can the picture tell you what I can look at to find bugs?

Alpha zero refactor (testing and polishing)

@lanctot The main difference between graphs is in buffer/batch size: 2 ** 16 and 2 ** 10, that were default values for the model I use default value of averaging...

Alpha zero refactor (testing and polishing)

will share some progress tomorrow, you may approve checks later

Alpha zero refactor (testing and polishing)

I guess, we're making slight progress, do we not? ![telegram-cloud-photo-size-2-5314378076219637199-y](https://github.com/user-attachments/assets/663d849b-143f-47eb-9115-1890ebf0901d) give it a look @lanctot

Alpha zero refactor (testing and polishing)

The latest plots (minor tweaks and fixes here and there). Maybe, using much more resources (I used a toy config), there is smth here:

Alpha zero refactor (testing and polishing)

I found an example with hyperparameters for tic-tac-toe, and results look somewhat more intuitive (although, I had to reduce batch size fourfold due to the resource constraints)

Alpha zero refactor (testing and polishing)

@lanctot, now ttt's example works much better (see graphs, winrate is pretty nice for an example config). However, I want to test with more steps for connect4 to make sure...