imitation-learning icon indicating copy to clipboard operation
imitation-learning copied to clipboard

Parallelizing data load and training

Open mallela opened this issue 6 years ago • 1 comments

Hello!

I read in another issue that you load data and perform training in parallel. I was just wondering how exactly you do that? Because the bottle neck does not seem to be training (takes ~0.06s) but data pre-processing/fetching call ( augmentation using imgaug Sequential process ~0.8s; loading .h5 ~0.2s). I am using a batch size of 120.

Are you using multiprocessing or the TF data input pipeline?

Thanks, Praneeta

mallela avatar Feb 07 '19 00:02 mallela

Praneeta! In Tensorflow, the method dataset.map() has a parameter num_parallel_calls.

See how we use it in our training implementation of this paper: https://github.com/merantix/imitation-learning/blob/master/imitation/input_fn.py#L100

markus-hinsche avatar Feb 07 '19 07:02 markus-hinsche