Stoix
Stoix copied to clipboard
[FEATURE] Implement self-play for two-player zero-sum games
Issue: #99
Description: Add self-play versions of DQN and PPO for two-player zero-sum games in PGX environments.
Checklist:
- [x] Determine how to keep the value estimation consistent (e.g. flip the board, use a negative discount)
- [x] Add PGX environment configs
- [ ] Implement self-play for DQN
- [ ] And for PPO
- [ ] (optional) If possible, for AlphaZero