Deep-Reinforcement-Learning-for-Boardgames icon indicating copy to clipboard operation
Deep-Reinforcement-Learning-for-Boardgames copied to clipboard

Implement JSRL like training strategy

Open masus04 opened this issue 1 year ago • 0 comments

This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.

In order to generate game_state(t)s, we intend to perform the following steps:

  • Choose a prior strategy that can be configured to play deterministic or non-deterministic
  • Play the non-deterministic version of prior strategy either against itself or a deterministic version of itself up to time t
  • Determine which player is favoured according to the deterministic prior strategy
  • Play the exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.

Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1

masus04 avatar Aug 20 '22 13:08 masus04