bsuite
bsuite copied to clipboard
_total_steps assignment position bug
The _total_steps should be increased after the agent takes an action, as in the jax implementation, or during the update function call, as for DQN. If the assignment is done after computing the gradient, as in the current implementation, the agent will not be trained for values of the sgd_period values higher than one.