Deep-Reinforcement-Learning-for-Boardgames Implement JSRL like training strategy

Implement JSRL like training strategy

Open masus04 opened this issue 1 year ago • 0 comments

This approach intends to make use of a prior strategy in order to unroll the game up to a certain point in time t, then let the exploration strategy being trained take over. t is then gradually reduced as the exploration strategy improves.

In order to generate game_state(t)s, we intend to perform the following steps:

Choose a prior strategy that can be configured to play deterministic or non-deterministic
Play the non-deterministic version of prior strategy either against itself or a deterministic version of itself up to time t
Determine which player is favoured according to the deterministic prior strategy
Play the exploration strategy against the deterministic prior strategy, playing as the favoured player in order to guarantee it has a chance of winning.

Blog post: https://ai.googleblog.com/2022/04/efficiently-initializing-reinforcement.html?m=1

Aug 20 '22 13:08 masus04

Deep-Reinforcement-Learning-for-Boardgames Deep-Reinforcement-Learning-for-Boardgames copied to clipboard

Implement JSRL like training strategy

Deep-Reinforcement-Learning-for-Boardgames
Deep-Reinforcement-Learning-for-Boardgames copied to clipboard