Dmitry Gavrilenko
Results
1
issues of
Dmitry Gavrilenko
In samples/rainbow/05_dqn_prio_replay.py, weights are propagated to batch_weights_v and multiplied by (state_action_values - expected_state_action_values) ** 2 to calculate losses_v. (losses_v + 1e-5) is then used to calculate probabilites. However, according to...