alpha-zero-general
alpha-zero-general copied to clipboard
Generaliz MCTS for games with reward on each game step
What are the difficulties in using the algorithm for games that return reward on each step? And for infitit games? It seems, that you just need to slightly change the MCTS algorithm. That is, take into account the reward when calculating Q. It is also necessary that the getNextState function of the game additionally returns the reward.