zhezherun issues

Repositories
Issues
Comments

Results 3 issues of


                                            zhezherun

GaussianPolicy applies the same noise to all actions in the batch

If GaussianPolicy receives a batched TimeStep, it applies the same noise to all actions returned by the wrapped policy. Instead, it should sample a different noise term per batch element....

to_n_step_transition returns wrong results if episode was truncated by the time limit wrapper and N > 1

If an episode is truncated by the time limit wrapper, the last discount in that episode is set to 1.0 instead of 0.0. As a result, both the reward and...

Incorrect calculation of generalized advantage estimates in PPO

The following code in `PPOAgent.compute_advantages` ignores value predictions for final observations in the trajectory and instead passes one-before-last values to the `generalized_advantage_estimation` function twice: ```python # Arg value_preds was appended...