deep-rl-class unit2

unit2 - train

Open fardinafdideh opened this issue 1 year ago • 1 comments

I added (and commented) the following formula for epsilon calculation which as opposed to the current formula is dependent on the "n_training_episodes" (the two formulas output for some "n_training_episodes" has been shown in the figure), hence regardless of the "n_training_episodes" the epsilon value decays exponentially over the whole range of steps from "max_epsilon" to "min_epsilon": epsilon = max_epsilon * ((min_epsilon/max_epsilon)**(1/(n_training_episodes-1))) ** episode
The following lines in the "train" function were removed. "step" variable is unused. The variables "terminated" and "truncated" are evaluated as the output of "env.step(action)" before their first use, so there is no need to be initialized.
- step = 0
- terminated = False
- truncated = False
The "for" loop counter, "step", and also "info" were replaced with "_", because they are unused.

Dec 10 '23 18:12 fardinafdideh

Thanks for pointing this out I’m adding this for the december update

Dec 12 '23 08:12 simoninithomas