Antonin RAFFIN comments

Results 769 comments of


                                            Antonin RAFFIN

N-step updates for off-policy methods

>'d love to have retrace I would need to look at that again. but yes, it would be nice to have an eligibility trace estimate style (i.e. TD(lambda) style). The...

N-step updates for off-policy methods

Btw, n-step does not help for now on PyBullet envs (I think to think why, maybe some tuning is needed) but it improves results and stability on env like LunarLander...

N-step updates for off-policy methods

>In retrace(lambda), treebackup(lambda), Q(lambda), etc, the weighting also uses some other factor besides gamma, thus I added it in a single term. yes, I meant that gamma played already the...

N-step updates for off-policy methods

>Yeah makes sense, I assume the function will be of the form np.ndarray -> np.ndarray, so as to make it agnostic of the device. Good idea

N-step updates for off-policy methods

Yes, sounds good ;) I would wait for support of all algorithms anyway to avoid inconsistencies. But if it is ready and fully tested, we can merge it with master.

N-step updates for off-policy methods

>I don't have anything to add to NStepReplay, or any other tests. So the entropy is not part of it? Maybe add an example in the doc.

N-step updates for off-policy methods

>I’d rather have a separate implementation that includes entropy but it’s up to you. Hmm, the issue is that if we merge it now, it will be half-working only (working...

N-step updates for off-policy methods

>Use a component called say "batch generator" that given the indices produces the batch. How is that different from `_get_samples()` ? >Instead of subclassing the buffer class for the different...

[Optimization] Replay Buffers shouldn't use copy when using np.array

>I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located We need to be extra careful with...

[Optimization] Replay Buffers shouldn't use copy when using np.array

>Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors. Why would pytorch tensors be faster than numpy? The replay buffer was originally with pytorch tensors...