pymaid
pymaid copied to clipboard
Make clustering methods use parallel threads
Just copy paste code from get_urls_threaded
Pool class from multiprocessing seems to be the better choice here (see https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool) as Threading only really improves performance for non-CPU-heavy tasks (parallelism).
http://sebastianraschka.com/Articles/2014_multiprocessing.html#multi-threading-vs-multi-processing
https://stackoverflow.com/questions/41920124/multiprocessing-use-tqdm-to-display-a-progress-bar
Quick initial tests show 2-10fold increase in speed for e.g. reroot, pruning and clustering
Partially implemented with commit 46f933ce7f0e97738820efa37d79b5a144da4af2