AlphaZeroSimple
AlphaZeroSimple copied to clipboard
"We record the state and the probabilities produced by the MCTS." - do you mean board state, priors and values?
In your blog, you emphasize "We record the state and the probabilities produced by the MCTS" Do you mean we record board state, priors and values? Trainer.exceute_episode ret.append((hist_state, hist_action_probs, reward * ((-1) ** (hist_current_player != current_player))))