Benjamin Diamond
Benjamin Diamond
@simsim314, In a fork, I have implemented: 1) The DeepMind-style 119 planes of input (see [here](https://github.com/benediamond/chess-alpha-zero/blob/af7caee80929193522afb30fbd66b4ca56935b14/src/chess_zero/env/chess_env.py#L177)). 2) The DeepMind-style NN architecture, with 19 residual layers (see [here](https://github.com/benediamond/chess-alpha-zero/blob/af7caee80929193522afb30fbd66b4ca56935b14/src/chess_zero/agent/model_chess.py#L38)). 3) ...see Akababa's...
@Zeta36 I'm not sure I understand. As it stands, following DeepMind, we _already_ have a residual tower consisting of 1) A convolutional layer containing two filters 2) 19 residual blocks,...
@simsim314 see the comments on [this thread](https://github.com/Akababa/chess-alpha-zero/issues/3). I'm working on a new version that addresses the "policy flipping" issue; I think Akababa might already have one.
Hi @yhyu13, glad to have you on board. I have managed to train 1 generation so far on a separate fork I have created, in which I have tried certain...
@Zeta36 reaching a model checkpoint appears to take about 20 minutes. from total scratch I have losses `loss: 7.8623 - policy_out_loss: 7.1453 - value_out_loss: 0.3209`. by the time of the...
I am generating new self-play data, but my optimizer works much faster. So I generate many models on the same batch of self-play data. Yes, it sounds like overfitting. Do...
@Zeta36 you mean, I myself play against it? I haven't tried, but I can. But I don't expect it to be good. My evaluator has replaced the best model only...
@Zeta36 Yes. I think the more parameters, the more chance of overfitting, and I didn't realize this when I implemented the one-hot feature. I have now reduced the learning rate...
Yes, I think the main idea is drawing by 3-move repetition--though this too is strange, because each frame in the 8-state history _also_ includes how many times that position has...
@Zeta36 @brianprichardson, as you rightly point out, this new paper changes things. My goal will now become replicating this (the input and output structure of the NN need to be...