stable-baselines3 [Optimization] Replay Buffers shouldn't use copy when using np.array

Related to #49

In https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L198-L208

https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L346-L349

We call np.ndarray(x).copy(). This is unnecessary because np.array has the argument "copy" which is True by default.

https://numpy.org/doc/stable/reference/generated/numpy.array.html

import numpy as np
x = [1,2,3,4,5]
x1 = np.array(x)
x is x1
# False
x2 = np.array(x1)
x2 is x1
# False

Jul 17 '20 10:07 PartiallyUntyped

Good catch. Mind if I include this in a PR with credits to you (#110)? If you find bunch more such quirks and optimizations, feel free to bundle them up to a PR though :)

Jul 17 '20 10:07 Miffyli

I can make a large one that deals with code optimisation/quirks and leave #110 for algorithm performance, it's up to you :)

Jul 17 '20 10:07 PartiallyUntyped

Ok, lets make another PR for optimizations like this 👍 . Lets wait until agent performance has been matched so things do not crisscross to badly.

Jul 17 '20 10:07 Miffyli

After some more investigation, I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located, so

x[indice] = np.array(data)

Copies the data twice, once from data to the temporary, and once more from the temporary to x.

Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!).

https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays

Jul 17 '20 11:07 PartiallyUntyped

I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located

We need to be extra careful with that one, I remember having weird issues that apparently came from changes by reference (hence the extra conservative copy that is currently in the code).

But if everything is copied, then perfect.

Jul 17 '20 19:07 araffin

But if everything is copied, then perfect.

I played a bit with it in a local implementation and there seemed to be some performance degradation with respect to speed when I removed the second copy so I suggest that we investigate both scenarios when the time comes.

Jul 17 '20 19:07 PartiallyUntyped

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

Aug 02 '20 05:08 jarlva

Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.

Why would pytorch tensors be faster than numpy? The replay buffer was originally with pytorch tensors for storage but then it required several transformations from numpy <-> pytorch (e.g. for normalization) whereas the current implementation only do one conversion.

Aug 03 '20 08:08 araffin

Using pytorch tensors doesn't necessarily improve performance mainly due to the need to perform the computations @araffin mentioned, moreover, going cpu->gpu->cpu will probably degrade performance due to latency transfers. The data is usually rather small and fit in the cache so the transfers will probably dominate the runtime.

Aug 03 '20 17:08 PartiallyUntyped

Pardon, it was a general suggestion, not related specifically to this issue.

Aug 04 '20 06:08 jarlva

stable-baselines3 stable-baselines3 copied to clipboard

[Optimization] Replay Buffers shouldn't use copy when using np.array

stable-baselines3
stable-baselines3 copied to clipboard