hanxiao.github.io icon indicating copy to clipboard operation
hanxiao.github.io copied to clipboard

Get 10x Speedup in Tensorflow Multi-Task Learning using Python Multiprocessing · Han Xiao Tech Blog

Open hanxiao opened this issue 7 years ago • 7 comments

https://hanxiao.github.io/2017/07/07/Get-10x-Speedup-in-Tensorflow-Multi-Task-Learning-using-Python-Multiprocessing/

hanxiao avatar Jan 25 '18 21:01 hanxiao

Migrated from Disqus Yuxin Wu commented on 2017-07-10T00:35:59Z

If you're working with large images you'll soon find that multiprocessing.Queue is so slow. My code in tensorpack uses multiprocessing + zmq to load data even faster.
https://github.com/ppwwyyxx...
http://tensorpack.readthedo...

hanxiao avatar Jan 26 '18 08:01 hanxiao

Migrated from Disqus Han Xiao commented on 2017-07-10T06:49:35Z

thanks yuxin, good to know. will defintely try your code!

Yuxin Wu commented on 2017-07-10T00:35:59Z If you're working with large images you'll soon find that multiprocessing.Queue is so slow. My code in tensorpack uses multiprocessing + zmq to load data even faster. ...

hanxiao avatar Jan 26 '18 08:01 hanxiao

Migrated from Disqus Han Xiao commented on 2017-07-10T13:35:15Z

i stared your repo, really awesome work!

Yuxin Wu commented on 2017-07-10T00:35:59Z If you're working with large images you'll soon find that multiprocessing.Queue is so slow. My code in tensorpack uses multiprocessing + zmq to load data even faster. ...

hanxiao avatar Jan 26 '18 08:01 hanxiao

Migrated from Disqus Yuxin Wu commented on 2017-07-10T15:21:33Z

Thanks :)

Han Xiao commented on 2017-07-10T13:35:15Z i stared your repo, really awesome work!

hanxiao avatar Jan 26 '18 08:01 hanxiao

Migrated from Disqus Tian Lan commented on 2018-01-17T19:46:27Z

The issue for multiprocessing is that it is copy-based not reference-based, which means that whenever you feed in data via multiprocessing, python will just create a duplicate copy for its own consumption. This could be very problematic for some applications. But I have to say, tensorflow's multi-threading queue is a mystery, it could be significantly slower than just using feed_dict, but also you can observe the opposite effect, so for your specific problem, it looks nice

hanxiao avatar Jan 26 '18 08:01 hanxiao

Can you share your code and dataset for us learning ?

Bigwode avatar May 24 '18 02:05 Bigwode

Hi Hanxiao, thanks for your tutorial. How do you know tf.Dataset use multiprocessing instead of multithreading? Also, I find that the Event doesn't really work. In many case I send the stopevent but the program still hang at the pool stage, the subprocess just doesn't seem to get the stop signal. It takes very long time to terminate the subprocesses.

kunrenzhilu avatar Sep 21 '18 09:09 kunrenzhilu