pymaid icon indicating copy to clipboard operation
pymaid copied to clipboard

Make clustering methods use parallel threads

Open schlegelp opened this issue 6 years ago • 5 comments

Just copy paste code from get_urls_threaded

schlegelp avatar Aug 28 '17 13:08 schlegelp

Pool class from multiprocessing seems to be the better choice here (see https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool) as Threading only really improves performance for non-CPU-heavy tasks (parallelism).

schlegelp avatar Aug 28 '17 21:08 schlegelp

http://sebastianraschka.com/Articles/2014_multiprocessing.html#multi-threading-vs-multi-processing

schlegelp avatar Aug 29 '17 07:08 schlegelp

https://stackoverflow.com/questions/41920124/multiprocessing-use-tqdm-to-display-a-progress-bar

schlegelp avatar Aug 29 '17 08:08 schlegelp

Quick initial tests show 2-10fold increase in speed for e.g. reroot, pruning and clustering

schlegelp avatar Aug 29 '17 19:08 schlegelp

Partially implemented with commit 46f933ce7f0e97738820efa37d79b5a144da4af2

schlegelp avatar Aug 29 '17 20:08 schlegelp