Stoix
Stoix copied to clipboard
[FEATURE] Implement self-play for two-player zero-sum games
Description: Add self-play versions of DQN and PPO for two-player zero-sum games in PGX environments.
Checklist:
- [ ] Determine how to keep the value estimation consistent (e.g. flip the board or reverse the discount for opponent values)
- [ ] Add PGX environment configs
- [ ] Implement self-play for DQN
- [ ] And for PPO
- [ ] (optional) If possible, for AlphaZero