antropy icon indicating copy to clipboard operation
antropy copied to clipboard

Speed up importing antropy

Open jftsang opened this issue 3 years ago • 4 comments

Create a file called import.py with the single line import antropy. On my machine (Linux VM), this takes at least 10 seconds to run.

Using pyinstrument tells me that most of the time is spent importing numba. Is there any possibility of speeding this up? Seems like this is a known issue with numba, though: see e.g. https://github.com/numba/numba/issues/4927.

$ pyinstrument import.py 

  _     ._   __/__   _ _  _  _ _/_   Recorded: 16:36:28  Samples:  7842
 /_//_/// /_\ / //_// / //_'/ //     Duration: 12.368    CPU time: 11.963
/   _/                      v3.4.1

Program: import.py

12.368 <module>  import.py:1
└─ 12.368 <module>  antropy/__init__.py:2
   ├─ 6.711 <module>  antropy/fractal.py:1
   │  └─ 6.711 wrapper  numba/core/decorators.py:191
   │        [14277 frames hidden]  numba, llvmlite, contextlib, pickle, ...
   ├─ 3.034 <module>  antropy/entropy.py:1
   │  ├─ 2.390 wrapper  numba/core/decorators.py:191
   │  │     [5009 frames hidden]  numba, abc, llvmlite, inspect, contex...
   │  └─ 0.522 <module>  sklearn/__init__.py:14
   │        [374 frames hidden]  sklearn, scipy, inspect, enum, numpy,...
   └─ 2.618 <module>  antropy/utils.py:1
      ├─ 1.584 wrapper  numba/core/decorators.py:191
      │     [5027 frames hidden]  numba, abc, functools, llvmlite, insp...
      ├─ 0.895 <module>  numba/__init__.py:3
      │     [1444 frames hidden]  numba, llvmlite, pkg_resources, warni...
      └─ 0.138 <module>  numpy/__init__.py:106
            [190 frames hidden]  numpy, pathlib, urllib, collections, ...

To view this report with different options, run:
    pyinstrument --load-prev 2021-06-17T16-36-28 [options]

jftsang avatar Jun 17 '21 15:06 jftsang

Hi @jftsang,

Unfortunately, I don't think there's anything that we can do to fix this. Numba allows to greatly improve the computation time of some functions in antropy, and the small cost of that is a longer import time. I think even 10 seconds is definitely reasonable. Let's hope that future versions of Numba will improve this.

Thanks, Raphael

raphaelvallat avatar Jun 17 '21 21:06 raphaelvallat

Hi @raphaelvallat, just another thought - would it be possible to make the compilation use of the @jit-ted functions optional? I wonder what the tradeoff would be depending on the scale of the data: for a short array is there much to gain beyond plain old numpy?

Jonny

jftsang avatar Aug 05 '22 11:08 jftsang

Hi @jftsang,

I believe you can set up the export NUMBA_DISABLE_JIT=1 environment variable to disable JIT (see https://numba.pydata.org/numba-doc/dev/reference/envvars.html).

raphaelvallat avatar Aug 05 '22 22:08 raphaelvallat

That seems to work well. I shall do some profiling to estimate the scales at which @jit-ting becomes worthwhile.

jftsang avatar Aug 13 '22 11:08 jftsang