robotics-rl-srl icon indicating copy to clipboard operation
robotics-rl-srl copied to clipboard

[feature improving request (srl_zoo)] Around 30% speed-up by changing few lines code

Open ncble opened this issue 5 years ago • 4 comments

Problem description

In the current (origin/master) version, there are two preprocessing mode: 1. 'tf' and 2. 'image_net'. I have noticed that with the some device/python environment/encoder model/robot environment etc, the option 'image_net' provide 30% speed-up compared to 'tf'.

Reproduce the problem

  • Under robotics-rl-srl/
    $ python -m environments.dataset_generator --env MobileRobotGymEnv-v0 --name mobile2D_fixed_tar_seed_0 --seed 0 --num-cpu 8
  • Modify the script srl_zoo/preprocessing/utils.py so that the preprocessing mode is 'tf'
  • Under srl_zoo/ $ python train.py --data-folder mobile2D_fixed_tar_seed_0 --losses autoencoder

With the original version of srl_zoo, the training time per epoch of autoencoder (under 'tf' mode) is about 43s on my computer and with the following modification, the time reduces to 31s.

Solution

I propose to change the script srl_zoo/preprocessing/utils.py (both the functions preprocessInput and deNormalize.

  def preprocessInput(x, mode="tf"):
      ....
      assert x.shape[-1] == 3, "Color channel must be at the end of the tensor {}".format(x.shape)
      x /= 255.
      if mode == "tf":
          # x -= 0.5
          # x *= 2.
          ## The following code is 33% faster than above one.
          x[..., 0] -= 0.5
          x[..., 1] -= 0.5
          x[..., 2] -= 0.5
          x[..., 0] *= 2.
          x[..., 1] *= 2.
          x[..., 2] *= 2.

deNormalize is similar.

ncble avatar May 24 '19 13:05 ncble

This is unexpected, but why not if the result is the same.

araffin avatar May 24 '19 14:05 araffin

Agreed ! I was shocked when I discovered this. Do you have any idea how this happens ? I guess it's related to the multiprocessing ?

ncble avatar May 28 '19 09:05 ncble

I couldn't reproduce your results...

minimal code:

import numpy as np

def prepro(x, mode='one'):
	x /= 255.
	if mode == 'one':
		x -= 0.5
		x *= 2.
	else:
		x[..., 0] -= 0.5
		x[..., 1] -= 0.5
		x[..., 2] -= 0.5
		x[..., 0] *= 2.
		x[..., 1] *= 2.
		x[..., 2] *= 2.
	return x


image = 255. * np.random.random((224, 224, 3))

a = prepro(image.copy())
b = prepro(image.copy(), mode='test')

assert np.allclose(a, b)

and in a ipython console:

In [1]: from test import prepro, image

In [2]: %timeit prepro(image.copy(), mode='test')
782 µs ± 9.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit prepro(image.copy())
535 µs ± 7.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: %timeit prepro(image.copy())
534 µs ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

araffin avatar May 28 '19 11:05 araffin

Yes, I agree that these two methods alone are the same (in term of results and timing), but when you call it with the data_loader.py (please change the scripts as I described above) then the elapsed time is significant different ! That's why I guess the problem is related to multiprocessing.

ncble avatar May 28 '19 12:05 ncble