pytrax icon indicating copy to clipboard operation
pytrax copied to clipboard

How to efficiently run in parallel

Open ma-sadeghi opened this issue 4 years ago • 11 comments

Hey @TomTranter,

I'm trying to run pytrax in parallel. I already put my script body inside a block:

if __name__ == "__main__":
    # Body

However, I don't get significant speed up when changing the num_proc argument. The image I'm running the simulation on is roughly 200^3 voxels and I use 100,000 walkers and 1,000 time steps. Here are the run times for num_proc = [1, 2, 4, 8] (the machine has 8 physical cores):

Elapsed time in seconds: 33.01
Elapsed time in seconds: 33.29
Elapsed time in seconds: 27.83
Elapsed time in seconds: 25.13

ma-sadeghi avatar Apr 28 '20 14:04 ma-sadeghi

1000 time steps isn't very much so there's a bit of overhead to increasing processors and you should see better gains for longer simulations

TomTranter avatar Apr 28 '20 15:04 TomTranter

Indeed, 1000 time steps won't even give you valid results...should be 100,000 or more right?

jgostick avatar Apr 28 '20 15:04 jgostick

I guess that's a bit of trial and error but certainly 1000 isn't enough even for a relatively small image. Each step is along one axis only so right away you're down to 333 for each direction and your image is around that size. You can plot the msd and increase steps until it straightens - also be careful of walkers getting stuck at the edges as well as in blind pores as when they leave the image they travel in a reflected copy of the image. You want to make sure you are only probing the largest fully connected cluster of voxels really - bit of a limitation

TomTranter avatar Apr 28 '20 15:04 TomTranter

It doesn't let me to go that far. Here's the output for nw=10,000 and nt=20,000:

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 205, in _sendback_result
    exception=exception))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/queues.py", line 364, in put
    self._writer.send_bytes(obj)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "effective_prop.py", line 103, in <module>
    tau_rw = calc_tau_rw(im=crop(void, frac=0.1), nt=20000, nw=10000, ax=1, num_proc=nproc)
  File "effective_prop.py", line 20, in calc_tau_rw
    rw.run(nt=nt, nw=nw, same_start=False, stride=1, num_proc=num_proc)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/site-packages/pytrax/__RandomWalk__.py", line 294, in run
    mapped_coords = list(pool.map(self._run_walk, batches))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

ma-sadeghi avatar Apr 28 '20 18:04 ma-sadeghi

seems like an 'int' overflow issue. That number is exactly (2^32)/2.

jgostick avatar Apr 28 '20 18:04 jgostick

This the best that I could get: nw=10,000 and nt=10,000 (num_procs = [1, 2, 4, 8])

Elapsed time in seconds: 19.82
Elapsed time in seconds: 22.27
Elapsed time in seconds: 21.29
Elapsed time in seconds: 18.77

ma-sadeghi avatar Apr 28 '20 18:04 ma-sadeghi

What about if you increase stride.

On Tue, 28 Apr 2020, 19:08 Amin Sadeghi, [email protected] wrote:

This the best that I could get: nw=10,000 and nt=10,000 (num_procs = [1, 2, 4, 8])

Elapsed time in seconds: 19.82 Elapsed time in seconds: 22.27 Elapsed time in seconds: 21.29 Elapsed time in seconds: 18.77

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PMEAL/pytrax/issues/17#issuecomment-620768677, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV5YRMCAE45ZIP3TSW57ELRO4LRLANCNFSM4MS4AFEA .

TomTranter avatar Apr 29 '20 07:04 TomTranter

Thanks @TomTranter. It's much better with stride=10 or even stride=100. Here's the result for nt=100,000, nw=10,000, stride=100:

Elapsed time in seconds: 112.59
Elapsed time in seconds: 66.02
Elapsed time in seconds: 48.41
Elapsed time in seconds: 45.39

ma-sadeghi avatar Apr 29 '20 22:04 ma-sadeghi

It's hard to profile multiprocessed code but the fact that stride makes a big difference would suggest that the data transfer is slowing it down not the computation. I have experimented with shared memory arrays in some other code which may solve this problem. Alternatively it may be time to overhaul the multiprocessing backend and look at dask as @jgostick suggests

TomTranter avatar Sep 25 '20 10:09 TomTranter

Does the particle's stride affect the accuracy of calculating the tortuosity, and if so, to what extent? Additionally, does this parallel program require the addition of the multiprocessing library and its corresponding code to run, besides the num_proc parameter? I have also noticed that increasing num_proc does not speed up the computation. My particles have dimensions (31500, 28000), and I would like to have more if possible.

pppppink avatar Jul 18 '23 15:07 pppppink

I haven't looked at this code for a while but I think stride is just for reporting so shouldn't affect accuracy. multiprocessing is a standard python library so you should have it already. There's some set up involved though so doesn't speed up small simulations and parallelizes by walkers not by time so if you are running long simulations with few walkers will make no difference

TomTranter avatar Jul 18 '23 15:07 TomTranter