Antonin RAFFIN
Antonin RAFFIN
>'d love to have retrace I would need to look at that again. but yes, it would be nice to have an eligibility trace estimate style (i.e. TD(lambda) style). The...
Btw, n-step does not help for now on PyBullet envs (I think to think why, maybe some tuning is needed) but it improves results and stability on env like LunarLander...
>In retrace(lambda), treebackup(lambda), Q(lambda), etc, the weighting also uses some other factor besides gamma, thus I added it in a single term. yes, I meant that gamma played already the...
>Yeah makes sense, I assume the function will be of the form np.ndarray -> np.ndarray, so as to make it agnostic of the device. Good idea
Yes, sounds good ;) I would wait for support of all algorithms anyway to avoid inconsistencies. But if it is ready and fully tested, we can merge it with master.
>I don't have anything to add to NStepReplay, or any other tests. So the entropy is not part of it? Maybe add an example in the doc.
>I’d rather have a separate implementation that includes entropy but it’s up to you. Hmm, the issue is that if we merge it now, it will be half-working only (working...
>Use a component called say "batch generator" that given the indices produces the batch. How is that different from `_get_samples()` ? >Instead of subclassing the buffer class for the different...
>I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located We need to be extra careful with...
>Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors. Why would pytorch tensors be faster than numpy? The replay buffer was originally with pytorch tensors...