pisa icon indicating copy to clipboard operation
pisa copied to clipboard

Handle threads explicitly; add `PISA_NUM_THREADS` env var

Open jllanfranchi opened this issue 7 years ago • 13 comments

I think we should add PISA_NUM_THREADS env var. If OMP_NUM_THREADS, 'MKL_NUM_THREADS', and/or NUMBA_NUM_THREADS are not defined in the environment, then these should be set in the env to the value of PISA_NUM_THREADS. If any are defined in the env, then PISA should not change its value.

Furthermore, PISA_NUM_THREADS should default to 1 if not defined in env by the user. This is better behavior, IMO, than the default since each of the above env vars will each take as many cores as are on the system, which is often problematic for cluster jobs where the number of cores on the system (and hence what these default to) is not the number of cores assigned to the job.

If there are any other such variables, we should set them explicitly as well if not already defined.

jllanfranchi avatar Jan 16 '18 15:01 jllanfranchi

There should also be a PISA variable e.g. NUM_THREADS defined in __init__.py so that PISA modules can import and use that to know how many threads the user has specified.

jllanfranchi avatar Jan 18 '18 17:01 jllanfranchi

I'm wondering what the purpose would be behind setting globals like NUMBA_NUM_THREADS (similarly for OMP_NUM_THREADS which I think in some places in pisa plays the role of what NUM_THREADS would be supposed to do) to the value of PISA_NUM_THREADS. Is there a way to limit the number of threads numba (numpy) uses without setting NUMBA_NUM_THREADS (OMP_NUM_THREADS or MKL_NUM_THREADS) as an environment variable?

thehrh avatar Feb 13 '18 15:02 thehrh

Ah, maybe I misunderstood and the goal is not just to set the globals but to add e.g. OMP_NUM_THREADS to os.environ from within python and pass that around somehow and hope that any omp routines respect the value?

thehrh avatar Feb 13 '18 15:02 thehrh

Yes, exactly. If the user specifies a PISA environment variable like (from Bash command line)

export PISA_NUM_THREADS=1

then PISA tries to make that one variable force all (known) software underlying it use that many threads, too. This is done by PISA setting e.g. OMP_NUM_THREADS, etc. to the same value the user specified for PISA_NUM_THREADS... but these other env vars are only set by PISA if they are not already defined in the user's environment.

In this way, I think it does the most sensible and user-friendly thing. You don't end up with Intel MKL, Numba, OpenMP, ... unknowingly taking over all CPU cores when you run PISA when you thought that you'd told PISA to just use 1 thread. However, you can also modify the behavior of the underlying thread-enabled software yourself if you wish by setting their env vars explicitly, and so you can have any threading behavior you want.

Similar logic is used now for the PISA_CACHE_DIR and NUMBA_CACHE_DIR env vars.

jllanfranchi avatar Feb 13 '18 16:02 jllanfranchi

How exactly is NUMBA_CACHE_DIR carried over to e.g. utils/numba_tools.py?

thehrh avatar Feb 13 '18 16:02 thehrh

Or was that supposed to from pisa import numba_jit and the latter would then know about NUMBA_CACHE_DIR because NUMBA_CACHE_DIR is part of the environment at import time?

thehrh avatar Feb 13 '18 16:02 thehrh

Any import from PISA (or a sub-module of PISA) will execute $PISA/pisa/__init__.py, and once that has been run, then NUMBA_CACHE_DIR will have been set appropriately. I think that the env vars like NUMBA_CACHE_DIR need to be set before the first Numba import, but I'm not quite sure about that...

But if that is the case, then the rule is that a module just needs to import something from pisa, from pisa.foo, etc., before doing anything with Numba. And if that's true, then numba_tools.py does violate this assumption. Tagging @philippeller here to see if he has thought this through.

jllanfranchi avatar Feb 13 '18 16:02 jllanfranchi

To me it looks like if we wanted to make sure numpy respected OMP_NUM_THREADS we can't just import numpy as np in any module, but would have to import it from pisa (adding OMP_NUM_THREADS to os.environ in __init__.py before numpy is imported has no effect on numpy routines I run within some module, when the module does import numpy).

thehrh avatar Feb 13 '18 17:02 thehrh

It doesn't seem to be that easy either -.- Not sure what's going on. Anyway...

thehrh avatar Feb 13 '18 18:02 thehrh

We should first test out how the OMP_NUM_THREADS, MKL_NUM_THREADS, and NUMBA_NUM_THREADS env vars are treated (do the actually need to be set prior to the first import?). If they do need need to be set prior to first import, there's e.g. the following solution to setting threads for MKL; maybe there are equivalents for the others:

https://stackoverflow.com/questions/28283112/using-mkl-set-num-threads-with-numpy#28293128

jllanfranchi avatar Feb 13 '18 20:02 jllanfranchi

That is what a small interactive test suggested to me (setting os.environ['OMP_NUM_THREADS'] before and after importing numpy and in the end running np.dot and observing its threading behaviour), but then I couldn't figure out how to tell numpy anywhere throughout pisa to respect OMP_NUM_THREADS. I'll take a look at the link above and keep poking at it.

thehrh avatar Feb 15 '18 18:02 thehrh

Numpy doesn't use threading explicitly, but does so implicitly if the BLAS library it employs does so. E.g. numpy "uses" MKL_NUM_THREADS if it is compiled with MKL (the Intel BLAS lib). This is the case, e.g. if you're using the Anaconda distribution of Python (I think it's the default version of numpy Continuum gives you).

Not sure if there are BLAS libraries (OpenBLAS, Atlas, etc are others I've heard of) use OpenMP threading or otherwise.

jllanfranchi avatar Feb 15 '18 18:02 jllanfranchi

Sorry, I didn't want to get into a discussion about which variables numpy respects, but it would seem that you can set OMP_NUM_THREADS to limit numpy's implicit threading in anaconda, even when it's compiled with MKL - it's what I've observed (and what https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications seems to suggest, too?). I actually can't successfully limit the number of threads by setting MKL_NUM_THREADS in os.environ before importing numpy, even though the latter is compiled with MKL.

thehrh avatar Feb 15 '18 18:02 thehrh