imgaug
imgaug copied to clipboard
Multiprocessing and tensorflow 2.x
Hey,
Background
We are heavly use with imgaug
for augmentation image. We do it over custom generator
. We built this custom generator to serve keras
fit_generator
function. For performance reason we do use multiprocess
for fit_generator
. This cause keras
to spawn a pool of workers to handle each generated item
.
Issue
Since tf 2.x we started to get the following warnning from tf
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended. [2020-05-11 07:49:40,808] - WARNING: multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
Any idea how to overcome this ?
I recall the imauag
is not multithread safe.
imgaug should be thread-safe I think. You only have to be careful with the seed that each child process uses, otherwise you risk that all workers use the same seed and generate the same transformations (just applied to different images). There is also the problem of ensuring reproduceability when you can't be sure which worker process will get which batch of data, so you might have to set the worker's seed on a per-batch basis, conditional on the batch's unique ID.
I'm not that familiar with tf.data
, but as far as I know it basically comes down to generating the dataset of examples once and then (during train/eval) applying only tensorflow functions onto it. I.e. no numpy data is allowed. imgaug does not have such tensorflow implementations of its operations, and therefore cannot be used in this way. The only way to still use it is to apply imgaug during the dataset generation, e.g. by saving each image not once, but ten augmented times.
You're right, imgaug is not thread-safe, so when you use multiprocessing=True
, you have to use the "forkserver" method to create new processes and avoid deadlocks, set by using import multiprocessing as mp
followed by mp.set_start_method("forkserver")
at the start of your program. The tensorflow warnings will still show up, but can be safely ignored and you should not run into any deadlocks. When you do this, you also have to make sure that all objects in your generator are picklable.
But, as @aleju alluded, you might have to be careful to set different imgaug seeds for each child process so that all workers are not performing the same augmentations in the same order on all your data.
can turn off multiprocessing in imgaug or only use multithread ?