gym-bandits
gym-bandits copied to clipboard
Nonstationary bandit
Added a new bandit with moving q*(a) values. The values change after each step. Mentioned in Reinforcement Learning: An Introduction (Sutton, Barto) Section: 2.5 Tracking a Nonstationary Problem
Potential future additions/new bandit: Non-stationary bandit where the shift in q*(a) values is determined by a normal distribution rather than shifting by a constant amount each step.