numpy icon indicating copy to clipboard operation
numpy copied to clipboard

BUG: Improve Numpy startup time

Open eendebakpt opened this issue 3 years ago • 1 comments

Describe the issue:

Analysing the import time of numpy using importtime-waterfal shows some parts can be improved. The import time varies with the platform (windows/linux) and whether numpy is installed in development mode or as a regular install.

  • For development mode a bottleneck is the import of numpy.version which calls the expensive git_pieces_from_vcs. This can take 200-400 ms.
  • For windows a bottleneck is the usage of platform.system() in _add_newdocs_scalars. A PR to improve this is #22060
  • In numpy.compact._pep440 some regular expressions are compiled, but they are not used in the numpy import. They could be replaced with something like:
_legacy_version_component_re = None
def _parse_version_parts(s):
    nonlocal _legacy_version_component_re 
    if _legacy_version_component_re  is None:
         _legacy_version_component_re = re.compile(    r"(\d+ | [a-z]+ | \.| -)", re.VERBOSE,)

    for part in _legacy_version_component_re.split(s):
         ....
  • numpy.core._multiarray_umath takes quite long to import (12 ms). There is some initialization going on (see PyInit__multiarray_umath), but without profiling it is not clear what is taking the time

Reproduce the code example:

None

Error message:

None

NumPy/Python version information:

Not relevant

eendebakpt avatar Jul 29 '22 18:07 eendebakpt

The slow execution of git_pieces_from_vcs is mainly due to execution of a git describe command. This can be slow if there has been a large number of commits since the latest tag (see https://lkml.iu.edu/hypermail/linux/kernel/1210.3/02531.html)

When testing the result of git describe is v1.24.0.dev0-580-g0ef197268, indicating we have 580 commits since the latest tag. Execution time is about 160 ms:

eendebakpt@woelmuis:/mnt/data/numpy$ time git describe --tags --dirty=
v1.24.0.dev0-580-g0ef197268

real	0m0,165s
user	0m0,154s
sys	0m0,016s

Adding an artificial tag using git -a test_tag -m "Testing tag" indeed results in much faster execution of git describe:

eendebakpt@woelmuis:/mnt/data/numpy$ git tag -a tag_test -m "Testing tag"
eendebakpt@woelmuis:/mnt/data/numpy$ time git describe --tags --dirty= --always --long
tag_test-0-g0ef197268

real	0m0,019s
user	0m0,020s
sys	0m0,003s

This is only an issue for developers, not for normal users.

eendebakpt avatar Aug 03 '22 19:08 eendebakpt

A profiling result for the import part of multiarray umath:

image

The import of the datetime module is something that will probably happen anyway. The initumath is expensive with a rather large part in a legacy method add_and_return_legacy_wrapping_ufunc_loop. @seberg Is this something we still need?

eendebakpt avatar Jan 13 '23 11:01 eendebakpt

Still very much relevant. We are creating an object for every single loop, which I guess takes some time.

In principle some/all of this could be delayed to the first use for NumPy loops, but if you look at the code, you will see that there is a comment about needing to do that in case there is loop ambiguity 'OO->?' and 'OO->O'.

So yes, relevant. Could something be done about it? Maybe, but it isn't super clear to me that it would be easy unfortunately.

seberg avatar Jan 13 '23 12:01 seberg

The easy items have been addressed, I will close the issue.

eendebakpt avatar Jan 13 '23 12:01 eendebakpt