Deep-Reinforcement-Learning-for-Boardgames
Deep-Reinforcement-Learning-for-Boardgames copied to clipboard
Implement JSRL like training strategy
This approach intends to make use of a prior strategy
in order to unroll the game up to a certain point in time t
, then let the exploration strategy
being trained take over. t
is then gradually reduced as the exploration strategy
improves.
In order to generate game_state(t)
s, we intend to perform the following steps:
- Choose a
prior strategy
that can be configured to play deterministic or non-deterministic - Play the non-deterministic version of
prior strategy
either against itself or a deterministic version of itself up to timet
- Determine which player is favoured according to the deterministic
prior strategy
- Play the
exploration strategy
against the deterministicprior strategy
, playing as the favoured player in order to guarantee it has a chance of winning.
Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1