pisa
pisa copied to clipboard
Handle threads explicitly; add `PISA_NUM_THREADS` env var
I think we should add PISA_NUM_THREADS
env var. If OMP_NUM_THREADS
, 'MKL_NUM_THREADS', and/or NUMBA_NUM_THREADS
are not defined in the environment, then these should be set in the env to the value of PISA_NUM_THREADS
. If any are defined in the env, then PISA should not change its value.
Furthermore, PISA_NUM_THREADS
should default to 1 if not defined in env by the user. This is better behavior, IMO, than the default since each of the above env vars will each take as many cores as are on the system, which is often problematic for cluster jobs where the number of cores on the system (and hence what these default to) is not the number of cores assigned to the job.
If there are any other such variables, we should set them explicitly as well if not already defined.
There should also be a PISA variable e.g. NUM_THREADS
defined in __init__.py
so that PISA modules can import and use that to know how many threads the user has specified.
I'm wondering what the purpose would be behind setting globals like NUMBA_NUM_THREADS
(similarly for OMP_NUM_THREADS
which I think in some places in pisa plays the role of what NUM_THREADS
would be supposed to do) to the value of PISA_NUM_THREADS
. Is there a way to limit the number of threads numba (numpy) uses without setting NUMBA_NUM_THREADS
(OMP_NUM_THREADS
or MKL_NUM_THREADS
) as an environment variable?
Ah, maybe I misunderstood and the goal is not just to set the globals but to add e.g. OMP_NUM_THREADS
to os.environ
from within python and pass that around somehow and hope that any omp routines respect the value?
Yes, exactly. If the user specifies a PISA environment variable like (from Bash command line)
export PISA_NUM_THREADS=1
then PISA tries to make that one variable force all (known) software underlying it use that many threads, too. This is done by PISA setting e.g. OMP_NUM_THREADS
, etc. to the same value the user specified for PISA_NUM_THREADS
... but these other env vars are only set by PISA if they are not already defined in the user's environment.
In this way, I think it does the most sensible and user-friendly thing. You don't end up with Intel MKL, Numba, OpenMP, ... unknowingly taking over all CPU cores when you run PISA when you thought that you'd told PISA to just use 1 thread. However, you can also modify the behavior of the underlying thread-enabled software yourself if you wish by setting their env vars explicitly, and so you can have any threading behavior you want.
Similar logic is used now for the PISA_CACHE_DIR
and NUMBA_CACHE_DIR
env vars.
How exactly is NUMBA_CACHE_DIR
carried over to e.g. utils/numba_tools.py
?
Or was that supposed to from pisa import numba_jit
and the latter would then know about NUMBA_CACHE_DIR
because NUMBA_CACHE_DIR
is part of the environment at import time?
Any import from PISA (or a sub-module of PISA) will execute $PISA/pisa/__init__.py
, and once that has been run, then NUMBA_CACHE_DIR
will have been set appropriately. I think that the env vars like NUMBA_CACHE_DIR
need to be set before the first Numba import, but I'm not quite sure about that...
But if that is the case, then the rule is that a module just needs to import something from pisa
, from pisa.foo
, etc., before doing anything with Numba. And if that's true, then numba_tools.py
does violate this assumption. Tagging @philippeller here to see if he has thought this through.
To me it looks like if we wanted to make sure numpy respected OMP_NUM_THREADS
we can't just import numpy as np
in any module, but would have to import it from pisa
(adding OMP_NUM_THREADS
to os.environ
in __init__.py
before numpy
is imported has no effect on numpy routines I run within some module, when the module does import numpy
).
It doesn't seem to be that easy either -.- Not sure what's going on. Anyway...
We should first test out how the OMP_NUM_THREADS
, MKL_NUM_THREADS
, and NUMBA_NUM_THREADS
env vars are treated (do the actually need to be set prior to the first import?). If they do need need to be set prior to first import, there's e.g. the following solution to setting threads for MKL; maybe there are equivalents for the others:
https://stackoverflow.com/questions/28283112/using-mkl-set-num-threads-with-numpy#28293128
That is what a small interactive test suggested to me (setting os.environ['OMP_NUM_THREADS']
before and after importing numpy and in the end running np.dot
and observing its threading behaviour), but then I couldn't figure out how to tell numpy anywhere throughout pisa to respect OMP_NUM_THREADS
. I'll take a look at the link above and keep poking at it.
Numpy doesn't use threading explicitly, but does so implicitly if the BLAS library it employs does so. E.g. numpy "uses" MKL_NUM_THREADS
if it is compiled with MKL (the Intel BLAS lib). This is the case, e.g. if you're using the Anaconda distribution of Python (I think it's the default version of numpy Continuum gives you).
Not sure if there are BLAS libraries (OpenBLAS, Atlas, etc are others I've heard of) use OpenMP threading or otherwise.
Sorry, I didn't want to get into a discussion about which variables numpy respects, but it would seem that you can set OMP_NUM_THREADS
to limit numpy's implicit threading in anaconda, even when it's compiled with MKL - it's what I've observed (and what https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications seems to suggest, too?). I actually can't successfully limit the number of threads by setting MKL_NUM_THREADS
in os.environ
before importing numpy, even though the latter is compiled with MKL.