Astraea Quinn S comments

Results 61 comments of


                                            Astraea Quinn S

N-step updates for off-policy methods

The implementation appears mostly the same. I would like to add the class and the arguments as parameters to the `OffPolicyAlgorithm` along with a variable named `replay_args`, in a similar...

N-step updates for off-policy methods

@araffin So I was working on this and there's a bug with both your and my implementations. The bug is due to `% self.buffer_size`. https://github.com/DLR-RM/stable-baselines3/blob/41eb0cb37e8ac35413c0b11d0d5bd5cfbd2c92e2/stable_baselines3/common/buffers.py#L441-L454 The bug is basically the...

N-step updates for off-policy methods

I merged master and removed the generic from `offpolicyalgorithm` since it wasn't discussed. I think it's ready for a review. Ps. sorry for autoformatting >.< ...

N-step updates for off-policy methods

Looks good! What about the FPS? It should have very small impact, but, there are still some optimizations that can be made.

N-step updates for off-policy methods

For sac, I am not sure if n-steps can be applied directly as I am under the impression that the backup requires the entropy for the intermediate states as well,...

N-step updates for off-policy methods

If you have plans for other n_step methods, (I'd love to have retrace), then, we can sketch out how those would work (most of which require action probabilities) and figure...

N-step updates for off-policy methods

> The weighting is already there with the gammas, no? In retrace(lambda), treebackup(lambda), Q(lambda), etc, the weighting also uses some other factor besides gamma, thus I added it in a...

N-step updates for off-policy methods

Yeah makes sense, I assume the function will be of the form `np.ndarray -> np.ndarray`, so as to make it agnostic of the device.

N-step updates for off-policy methods

So how should we continue from here? Since n_steps was planned for v1.1, I can take the time to add extensions and draft a more generic version of this that...

N-step updates for off-policy methods

I don't have anything to add to NStepReplay, or any other tests. So feel free to merge this. In the mean time, I can start drafting the next step (hah,...