robotics-rl-srl
robotics-rl-srl copied to clipboard
[feature improving request (srl_zoo)] Around 30% speed-up by changing few lines code
Problem description
In the current (origin/master) version, there are two preprocessing mode: 1. 'tf' and 2. 'image_net'. I have noticed that with the some device/python environment/encoder model/robot environment etc, the option 'image_net' provide 30% speed-up compared to 'tf'.
Reproduce the problem
- Under robotics-rl-srl/
$ python -m environments.dataset_generator --env MobileRobotGymEnv-v0 --name mobile2D_fixed_tar_seed_0 --seed 0 --num-cpu 8 - Modify the script srl_zoo/preprocessing/utils.py so that the preprocessing mode is 'tf'
- Under srl_zoo/ $ python train.py --data-folder mobile2D_fixed_tar_seed_0 --losses autoencoder
With the original version of srl_zoo, the training time per epoch of autoencoder (under 'tf' mode) is about 43s on my computer and with the following modification, the time reduces to 31s.
Solution
I propose to change the script srl_zoo/preprocessing/utils.py (both the functions preprocessInput and deNormalize.
def preprocessInput(x, mode="tf"):
....
assert x.shape[-1] == 3, "Color channel must be at the end of the tensor {}".format(x.shape)
x /= 255.
if mode == "tf":
# x -= 0.5
# x *= 2.
## The following code is 33% faster than above one.
x[..., 0] -= 0.5
x[..., 1] -= 0.5
x[..., 2] -= 0.5
x[..., 0] *= 2.
x[..., 1] *= 2.
x[..., 2] *= 2.
deNormalize is similar.
This is unexpected, but why not if the result is the same.
Agreed ! I was shocked when I discovered this. Do you have any idea how this happens ? I guess it's related to the multiprocessing ?
I couldn't reproduce your results...
minimal code:
import numpy as np
def prepro(x, mode='one'):
x /= 255.
if mode == 'one':
x -= 0.5
x *= 2.
else:
x[..., 0] -= 0.5
x[..., 1] -= 0.5
x[..., 2] -= 0.5
x[..., 0] *= 2.
x[..., 1] *= 2.
x[..., 2] *= 2.
return x
image = 255. * np.random.random((224, 224, 3))
a = prepro(image.copy())
b = prepro(image.copy(), mode='test')
assert np.allclose(a, b)
and in a ipython console:
In [1]: from test import prepro, image
In [2]: %timeit prepro(image.copy(), mode='test')
782 µs ± 9.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: %timeit prepro(image.copy())
535 µs ± 7.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [4]: %timeit prepro(image.copy())
534 µs ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Yes, I agree that these two methods alone are the same (in term of results and timing), but when you call it with the data_loader.py (please change the scripts as I described above) then the elapsed time is significant different ! That's why I guess the problem is related to multiprocessing.