parmap Are you a parmap user? Please enter

Hi,

I have curiosity to know who is using parmap and for what purpose. Sometimes I believe there are no users out there and then I feel happy when someone pops by and opens an issue. If you are using parmap and want to leave a note, please do that here. I would be very happy to know what is parmap being used for. Once you have answered feel free to click on "Unsubscribe" on the right if you don't want to receive further notifications from other parmap users.

For instance here is one user that wrote me about his paper on spinning black hole binaries where he had used parmap:

Davide Gerosa and Michael Kesden “PRECESSION. Dynamics of spinning black-hole binaries with python.” Phys. Rev. D 93, 124066 – Published 27 June 2016 arXiv:1605.01067 DOI

Thanks!

Jul 19 '17 11:07 zeehio

I actually found parmap on stackoverflow whilst looking for a nice, py2+py3 way to provide constant variables to map. Finding it supported tqdm was very pleasant. I'm using it help me process about 300GB of seismic data, I hand off to parmap to perform analysis calculations. Thanks for the useful library!

Jul 19 '17 12:07 mon

I'm using it for custom scikit-learn estimators.

Sep 19 '17 06:09 saddy001

You could attract potential users if you would add parmap as an answer to related questions on stackoverflow (e.g. https://stackoverflow.com/q/9911819). Indeed, I found it the best solution I tested. -- but you should state that you're the author

Sep 20 '17 05:09 saddy001

Thanks for the tip. I am not actively searching for more users though. It's great if they find parmap and they like it, and I will talk about parmap to anyone that might be interested. However, I can't spend time on finding users who might like parmap right now, and if these users came I would need to spend even more time fixing issues.

So, when I have the time I will start actively looking for more users. Until then they will have to find parmap if they want to. Feel free to tell others about parmap if you want, though.

Sep 20 '17 10:09 zeehio

I am currently using parmap for my master thesis about emotion detection in tweets.

Jun 09 '18 12:06 Strizzi12

Just found parmap and loving it, it saved me a lot of partial and pool calls! As for the application: signal analysis for single photon detectors.

Dec 03 '18 06:12 acere

One line code to use parallel computation and with progress bar. I love this tiny tool very much. I use it everywhere I need parallelization.

May 27 '19 03:05 zhenglilei

Hi - I am using parmap for generating nodes in knowledge graphs. A couple of questions:

If `pm_processes is not passed. Does the number of processes scale to the max available?
If each item in the list spawns a long process - Is chunking a good way to speed things up further?

Feb 09 '20 15:02 gryBox

@gryBox

Empty pm_processes

If pm_processes is not passed, parmap follows multiprocessing.Pool defaults and therefore uses os.cpu_count().

About chunksize values

By default, the chunksize is len(iterable)/(4*pm_parallel), rounding up if necessary. This is also the default from multiprocessing. If you have 200 tasks and 5 parallel processes, chunksize = 200/(4*5) = 10.

I will try to explain here why that default is reasonable going to the extremes:

chunksize = 1

Using a chunksize of 1 would mean that each task is submitted individually. As soon as one task is finished, the main process submits another one. This would be fine if submitting a task did not have any overhead, which is not the case. If each taks takes a short time to finish, using such a small chunksize would mean that multiprocessing has to spend in comparison a lot of time on submitting data and getting back the results. In this case parallelizing with chunksize=1 could make the code run slower.

chunksize = number of tasks

If you just create one big chunk, you can only send it to one process, so you can't parallelize. It is an absurdly high value. Instead of using this value please disable parallelization.

chunksize = num_tasks/num_processes

You split your tasks in as many groups as parallel processes. You minimize the submissions, so the overhead is minimal. This may seem like a very smart approach, but what happens if tasks take a different amount of time to complete? With this approach, if you have bad luck, one of your processes may get one or a lot of long tasks and while the other processes have finished, you will need to wait for that one process to finish multiple tasks. All tasks have already been submitted so the other processes can't do anything to help the process that has been given too much work.

chunksize = num_tasks/(4*num_processes)

This is a reasonable tradeoff. Each process would get on average 4 submissions of tasks. If one task was much longer than the rest, the process with that task would probably get 3 or 2 submissions and other processes would get 5 submissions each. While the overhead is a little bit bigger, the benefit on the general case is much larger.

chunksize summary

In summary, the default is usually good enough. If you have a huge amount of equally super short tasks maybe a larger chunksize would be significantly beneficial. I haven't done any formal benchmark, feel free to do so if you want.

Feb 10 '20 07:02 zeehio

@zeehio Thank you that is a clear and easy explanation. Leaving things at default for now. Wonderful tool!

Feb 10 '20 08:02 gryBox

tagbase-server uses parmap to asynchronously process biologging data from electronic tags deployed on various marine animals. This is an excellent utility library. Thank you @zeehio 👍

Jun 18 '22 22:06 lewismc

@zeehio I used parmap to target 24 million github repos for their language dependency files a few years ago. This was a part of some security analysis I was doing during my Master's Very glad this tool existed; especially since I didn't want to move to a compiled language for multi processing stuff.

Jul 06 '22 00:07 XChikuX

parmap parmap copied to clipboard

Are you a parmap user? Please enter

Empty pm_processes

About chunksize values

chunksize = 1

chunksize = number of tasks

chunksize = num_tasks/num_processes

chunksize = num_tasks/(4*num_processes)

chunksize summary

parmap
parmap copied to clipboard