Orbit
Orbit copied to clipboard
Epsilon Decay is done every step instead of episode
The epsilon decay in the code is under the module agent.replay()
which is called every step, making the epsilon rapidly decline during the first episode. I don't know if this was the intended behavior, but I've gotten better result by making a separate module for the epsilon decay and calling it by the end of an episode.