antropy
antropy copied to clipboard
Speed up importing antropy
Create a file called import.py
with the single line import antropy
. On my machine (Linux VM), this takes at least 10 seconds to run.
Using pyinstrument
tells me that most of the time is spent importing numba
. Is there any possibility of speeding this up? Seems like this is a known issue with numba
, though: see e.g. https://github.com/numba/numba/issues/4927.
$ pyinstrument import.py
_ ._ __/__ _ _ _ _ _/_ Recorded: 16:36:28 Samples: 7842
/_//_/// /_\ / //_// / //_'/ // Duration: 12.368 CPU time: 11.963
/ _/ v3.4.1
Program: import.py
12.368 <module> import.py:1
└─ 12.368 <module> antropy/__init__.py:2
├─ 6.711 <module> antropy/fractal.py:1
│ └─ 6.711 wrapper numba/core/decorators.py:191
│ [14277 frames hidden] numba, llvmlite, contextlib, pickle, ...
├─ 3.034 <module> antropy/entropy.py:1
│ ├─ 2.390 wrapper numba/core/decorators.py:191
│ │ [5009 frames hidden] numba, abc, llvmlite, inspect, contex...
│ └─ 0.522 <module> sklearn/__init__.py:14
│ [374 frames hidden] sklearn, scipy, inspect, enum, numpy,...
└─ 2.618 <module> antropy/utils.py:1
├─ 1.584 wrapper numba/core/decorators.py:191
│ [5027 frames hidden] numba, abc, functools, llvmlite, insp...
├─ 0.895 <module> numba/__init__.py:3
│ [1444 frames hidden] numba, llvmlite, pkg_resources, warni...
└─ 0.138 <module> numpy/__init__.py:106
[190 frames hidden] numpy, pathlib, urllib, collections, ...
To view this report with different options, run:
pyinstrument --load-prev 2021-06-17T16-36-28 [options]
Hi @jftsang,
Unfortunately, I don't think there's anything that we can do to fix this. Numba allows to greatly improve the computation time of some functions in antropy, and the small cost of that is a longer import time. I think even 10 seconds is definitely reasonable. Let's hope that future versions of Numba will improve this.
Thanks, Raphael
Hi @raphaelvallat, just another thought - would it be possible to make the compilation use of the @jit
-ted functions optional? I wonder what the tradeoff would be depending on the scale of the data: for a short array is there much to gain beyond plain old numpy?
Jonny
Hi @jftsang,
I believe you can set up the export NUMBA_DISABLE_JIT=1
environment variable to disable JIT (see https://numba.pydata.org/numba-doc/dev/reference/envvars.html).
That seems to work well. I shall do some profiling to estimate the scales at which @jit
-ting becomes worthwhile.