stable-baselines3
stable-baselines3 copied to clipboard
[Optimization] Replay Buffers shouldn't use copy when using np.array
Related to #49
In https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L198-L208
https://github.com/DLR-RM/stable-baselines3/blob/23afedb254d06cae97064ca2aaba94b811d5c793/stable_baselines3/common/buffers.py#L346-L349
We call np.ndarray(x).copy(). This is unnecessary because np.array has the argument "copy" which is True by default.
https://numpy.org/doc/stable/reference/generated/numpy.array.html
import numpy as np
x = [1,2,3,4,5]
x1 = np.array(x)
x is x1
# False
x2 = np.array(x1)
x2 is x1
# False
Good catch. Mind if I include this in a PR with credits to you (#110)? If you find bunch more such quirks and optimizations, feel free to bundle them up to a PR though :)
I can make a large one that deals with code optimisation/quirks and leave #110 for algorithm performance, it's up to you :)
Ok, lets make another PR for optimizations like this 👍 . Lets wait until agent performance has been matched so things do not crisscross to badly.
After some more investigation, I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located, so
x[indice] = np.array(data)
Copies the data twice, once from data to the temporary, and once more from the temporary to x.
Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!).
https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays
I believe that we don't need to copy as the assignment is done on the original data, i.e. where the buffer is located
We need to be extra careful with that one, I remember having weird issues that apparently came from changes by reference (hence the extra conservative copy that is currently in the code).
But if everything is copied, then perfect.
But if everything is copied, then perfect.
I played a bit with it in a local implementation and there seemed to be some performance degradation with respect to speed when I removed the second copy so I suggest that we investigate both scenarios when the time comes.
Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.
Another (general) optimization suggestion for numpy operations would be using numba or pytorch tensors.
Why would pytorch tensors be faster than numpy? The replay buffer was originally with pytorch tensors for storage but then it required several transformations from numpy <-> pytorch (e.g. for normalization) whereas the current implementation only do one conversion.
Using pytorch tensors doesn't necessarily improve performance mainly due to the need to perform the computations @araffin mentioned, moreover, going cpu->gpu->cpu will probably degrade performance due to latency transfers. The data is usually rather small and fit in the cache so the transfers will probably dominate the runtime.
Pardon, it was a general suggestion, not related specifically to this issue.