xdem icon indicating copy to clipboard operation
xdem copied to clipboard

`import xdem` is >2 times slower than most packages

Open erikmannerfelt opened this issue 3 years ago • 7 comments

Importing xdem takes much longer than most popular packages, presumably because of the many import statements, and that all modules are imported in the __init__.py.

Some comparisons on my laptop:

❯ time python -c "import os"
python -c "import os"  0.02s user 0.00s system 97% cpu 0.026 total

❯ time python -c "import rasterio" 
python -c "import rasterio"  0.78s user 0.72s system 786% cpu 0.190 total

❯ time python -c "import numpy"
python -c "import numpy"  0.67s user 0.74s system 966% cpu 0.146 total

❯ time python -c "import matplotlib"
python -c "import matplotlib"  0.81s user 0.72s system 769% cpu 0.200 total

❯ time python -c "import xdem" 
python -c "import xdem"  2.16s user 1.13s system 100% cpu 3.283 total

As you can see, importing xdem takes at least 2-3 times longer than other packages.

Basically, all modules are imported in xdem, meaning all import statements are run. I excluded those and ran only the non-xdem import statements:

❯ time (grep -rh "import " xdem/ --exclude __init__.py | grep -vE "xdem|__future__" | sed -e 's/  //g' -e 's/#.*//g' | sort | uniq | tr '\n' ';' | python)
1.83s user 0.80s system 202% cpu 1.298 total

It seems like the xdem overhead is somewhere along the magnitude of 300 ms, while the imports add the considerable time.

Are there established solutions for this? Of course, it can be made so that all modules are not imported when importing xdem, just like rasterio.warp, rasterio.features and others. I do however think it's super annoying with the multiple statements so it would be cool if there was another approach. Does anyone have any ideas?

Also a dislaimer: These times varied slightly depending on when I ran them, so they should just be seen as indications of relative speed.

erikmannerfelt avatar May 12 '21 09:05 erikmannerfelt

Back from holidays, trying to keep up with all the enhancements!

On this: I think that an import of 2 seconds is acceptable. This is compiled only once at the start of a code, and not repeatedly, so it won't affect performance much. For your questions, unfortunately I don't think that I have any valuable knowledge/abilities to help improving this :/

rhugonnet avatar May 17 '21 10:05 rhugonnet

I agree that it's not a top priority, @rhugonnet, and it's just a small annoyance for me right now. We should just make sure not to make it much worse, though! A benchmark test could for example validate this (so that tests fail if something is done so the import is suddenly 10s instead of 2)

import time
import subprocess
import sys

import numpy as np

durations  = []
for _ in range(5):
    start_time =  time.time()
    subprocess.run([sys.executable, "-c", "import xdem"], check=True)
    durations.append(time.time() - start_time)

assert np.mean(durations) < 3

On my laptop, this gives me a 1.38±0.03 s import time.

erikmannerfelt avatar May 17 '21 10:05 erikmannerfelt

Reviving this very old issue because I also sometimes feel annoyed by the slow import time... I came across this and thought it could be useful for looking at the issue in more detail: https://stackoverflow.com/a/52090274. Basically, one can run python -X importtime -c 'import xdem' to find the slow parts. I don't really understand all the output though.

adehecq avatar Jul 25 '23 12:07 adehecq

I'm not sure I get everything either, but from the output it looks like we cumulate the imports from our dependencies. Among the longest ones are numba, geopandas, scikit-learn and scikit-gstat. If you run the same command that you showed above for those independently, you get: import time: 595 | 205037 | numba, so 0.205s import time: 187 | 272283 | geopandas, so 0.272s import time: 291 | 810315 | skgstat, so 0.810s, import time: 213 | 313470 | sklearn, so 0.31s,

which seems to pretty much add up to the ~1.3s import for xdem (some of these packages have the same underlying dependencies like numpy, so the total import is a bit faster than the sum of the times above)

I'm not sure if we can fasten by importing only part of those packages? I tried to do it with importtime but it didn't seem to have an effect :/

rhugonnet avatar Jul 26 '23 02:07 rhugonnet

How about making some of the non-core dependencies lazy? At least skgstat (which may or may not contribute to 0.8 s of import time) is only used for spatial statistics. For anything else, it's imported in vain. Either it can be imported only in the functions where it's used, or we can use lazy_import. I've seen that in other projects but I can't remember where...

erikmannerfelt avatar Jul 27 '23 19:07 erikmannerfelt

I'm not sure I get everything either, but from the output it looks like we cumulate the imports from our dependencies. Among the longest ones are numba, geopandas, scikit-learn and scikit-gstat. If you run the same command that you showed above for those independently, you get: import time: 595 | 205037 | numba, so 0.205s import time: 187 | 272283 | geopandas, so 0.272s import time: 291 | 810315 | skgstat, so 0.810s, import time: 213 | 313470 | sklearn, so 0.31s,

which seems to pretty much add up to the ~1.3s import for xdem (some of these packages have the same underlying dependencies like numpy, so the total import is a bit faster than the sum of the times above)

I'm not sure if we can fasten by importing only part of those packages? I tried to do it with importtime but it didn't seem to have an effect :/

from X import Y will run X/__init__.py AND X/Y/__init__.py. So that syntax would in that case only do what one expects if the submodule is not included in the root __init__.py. In skgstat, for example, it seems like basically everything is imported in the root __init__.py, so importing submodules won't change anything really.

I had to double check this behaviour so I'll just paste a minimal example: I made a library with a submodule:

#library/__init__.py
print("Base imported")

#library/sublibrary/__init__.py
print("Sub imported")
>>> from library import sublibrary
Base imported
Sub imported

erikmannerfelt avatar Jul 27 '23 19:07 erikmannerfelt

Good to know for the from ... import ... syntax!! :smile:

If a lazy solution works, why not. But I'm not so sure about lazy_import, the package seems old and not maintained (last release 6 years ago, maybe core Python is stable and it doesn't need any?). It also adds an extra-dependency.

If it comes mostly from scikit-gstat and its base __init__, we could also open a PR there?

rhugonnet avatar Jul 27 '23 22:07 rhugonnet