pvtrace icon indicating copy to clipboard operation
pvtrace copied to clipboard

Issues with numpy threading and multiprocessing

Open danieljfarrell opened this issue 4 years ago • 1 comments

I've run your script a few times on different setup. Interestingly, I obtained different results :)

The main reason is that numpy already has some multi-threaded function (and it spawns a number of workers that depends on python version and platform).1

Once numpy is forced to a single thread2 the best performance are obtained. Multiprocessing is always slightly better than pathos (reasonable, as pathos uses multiprocessing as backend).3

With 12 cores and numpy set to 1 thread I got throughput_rays_per_sec around 430 with both Py 3.7.9, Py 3.8.5 and P 3.9.1 4

Default image Setting numpy to 1 thread image


[1] This is evident by CPU usage that is double or 4 times the expected. How many workers are present depends on numpy compile settings for accelerated algebra libraries (BLAST and friends) and environmental variables. More info with: ```python import numpy as np np.show_config() ```

[2] i.e. by setting the following environmental variable before importing numpy

import os
NUMPY_THREADS = 1
os.environ["MKL_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["NUMEXPR_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["OMP_NUM_THREADS"] = str(NUMPY_THREADS)

[3] If pathos.pools.ProcessPool is used, performance are further reduced (roughtly 10%).

[4] For Py>=3.8 the following line is needed to prevent an AttributeError in multiprocessing (see: https://bugs.python.org/issue39414)

atexit.register(pool.close)

Originally posted by @dcambie in https://github.com/danieljfarrell/pvtrace/issues/48#issuecomment-766417135

danieljfarrell avatar Jan 26 '21 20:01 danieljfarrell

To-do

  1. Add scaling notes to docs https://github.com/danieljfarrell/pvtrace/pull/48#issuecomment-768866025
  2. Repeat using python script

They will have different scaling behaviour because the script retains full history - very slow! The CLI will saved reduced data using the --end-rays option.

danieljfarrell avatar Feb 05 '21 14:02 danieljfarrell