gym-bandits icon indicating copy to clipboard operation
gym-bandits copied to clipboard

Nonstationary bandit

Open MouseAndKeyboard opened this issue 5 years ago • 0 comments

Added a new bandit with moving q*(a) values. The values change after each step. Mentioned in Reinforcement Learning: An Introduction (Sutton, Barto) Section: 2.5 Tracking a Nonstationary Problem

Potential future additions/new bandit: Non-stationary bandit where the shift in q*(a) values is determined by a normal distribution rather than shifting by a constant amount each step.

MouseAndKeyboard avatar Jan 15 '20 03:01 MouseAndKeyboard