deep-rl-class
deep-rl-class copied to clipboard
unit2 - train
-
I added (and commented) the following formula for epsilon calculation which as opposed to the current formula is dependent on the "n_training_episodes" (the two formulas output for some "n_training_episodes" has been shown in the figure), hence regardless of the "n_training_episodes" the epsilon value decays exponentially over the whole range of steps from "max_epsilon" to "min_epsilon": epsilon = max_epsilon * ((min_epsilon/max_epsilon)**(1/(n_training_episodes-1))) ** episode
-
The following lines in the "train" function were removed. "step" variable is unused. The variables "terminated" and "truncated" are evaluated as the output of "env.step(action)" before their first use, so there is no need to be initialized.
- step = 0
- terminated = False
- truncated = False
-
The "for" loop counter, "step", and also "info" were replaced with "_", because they are unused.
Thanks for pointing this out I’m adding this for the december update