numpy
numpy copied to clipboard
BUG: Improve Numpy startup time
Describe the issue:
Analysing the import time of numpy using importtime-waterfal shows some parts can be improved. The import time varies with the platform (windows/linux) and whether numpy is installed in development mode or as a regular install.
- For development mode a bottleneck is the import of
numpy.versionwhich calls the expensivegit_pieces_from_vcs. This can take 200-400 ms. - For windows a bottleneck is the usage of
platform.system()in_add_newdocs_scalars. A PR to improve this is #22060 - In
numpy.compact._pep440some regular expressions are compiled, but they are not used in the numpy import. They could be replaced with something like:
_legacy_version_component_re = None
def _parse_version_parts(s):
nonlocal _legacy_version_component_re
if _legacy_version_component_re is None:
_legacy_version_component_re = re.compile( r"(\d+ | [a-z]+ | \.| -)", re.VERBOSE,)
for part in _legacy_version_component_re.split(s):
....
numpy.core._multiarray_umathtakes quite long to import (12 ms). There is some initialization going on (seePyInit__multiarray_umath), but without profiling it is not clear what is taking the time
Reproduce the code example:
None
Error message:
None
NumPy/Python version information:
Not relevant
The slow execution of git_pieces_from_vcs is mainly due to execution of a git describe command. This can be slow if there has been a large number of commits since the latest tag (see https://lkml.iu.edu/hypermail/linux/kernel/1210.3/02531.html)
When testing the result of git describe is v1.24.0.dev0-580-g0ef197268, indicating we have 580 commits since the latest tag. Execution time is about 160 ms:
eendebakpt@woelmuis:/mnt/data/numpy$ time git describe --tags --dirty=
v1.24.0.dev0-580-g0ef197268
real 0m0,165s
user 0m0,154s
sys 0m0,016s
Adding an artificial tag using git -a test_tag -m "Testing tag" indeed results in much faster execution of git describe:
eendebakpt@woelmuis:/mnt/data/numpy$ git tag -a tag_test -m "Testing tag"
eendebakpt@woelmuis:/mnt/data/numpy$ time git describe --tags --dirty= --always --long
tag_test-0-g0ef197268
real 0m0,019s
user 0m0,020s
sys 0m0,003s
This is only an issue for developers, not for normal users.
A profiling result for the import part of multiarray umath:

The import of the datetime module is something that will probably happen anyway. The initumath is expensive with a rather large part in a legacy method add_and_return_legacy_wrapping_ufunc_loop. @seberg Is this something we still need?
Still very much relevant. We are creating an object for every single loop, which I guess takes some time.
In principle some/all of this could be delayed to the first use for NumPy loops, but if you look at the code, you will see that there is a comment about needing to do that in case there is loop ambiguity 'OO->?' and 'OO->O'.
So yes, relevant. Could something be done about it? Maybe, but it isn't super clear to me that it would be easy unfortunately.
The easy items have been addressed, I will close the issue.