unyt icon indicating copy to clipboard operation
unyt copied to clipboard

Optimize importing unyt

Open ngoldbaum opened this issue 7 years ago • 7 comments

As reported by @majj, importing unyt takes about one second. I suspect there isn't much we can do about this since we need to import both numpy and sympy.

ngoldbaum avatar Jul 13 '18 14:07 ngoldbaum

I've been digging into this since the problem has apparently gotten worse. Here's a graphical profile for import unyt (as from the latest master branch)

Screen Shot 2021-02-08 at 22 55 44

for reproduction:

$ python -X importtime import_unyt.py 2> import_unyt.log
$ tuna import_unyt.log

with

# import_unyt.py
import unyt

and using https://github.com/nschloe/tuna

So indeed, numpy and sympy are taking about a second to import (~0.75s in this particular run), but about 40% (0.9s) of the import time is spent importing unyt.physical_constants. I was able to narrow it down to this line: https://github.com/yt-project/unyt/blob/aaa2b244207c14c5f472214595a00201d105c0ae/unyt/unit_systems.py#L98

  • since no exception is actually caught in this try block when importing unyt, I tried removing the try block and keeping the call, which did not lead to any change I could measure with this (admittedly not very robust) profiling method
  • removing the call to Unit.in_base() altogether leads to an important gain however reduces the total import time by 0.8s (~35%) so there's room for optimisation here I think. I may report back if I can grasp a better idea of how to improve on this :)

neutrinoceros avatar Feb 08 '21 22:02 neutrinoceros

update: we can track it much more closely with

python -m cProfile -o log.pstats import_unyt.py
gprof2dot --colour-nodes-by-selftime log.pstats | dot -Tsvg -o out.svg

Where the most time-consuming island looks like this: Screenshot 2021-05-01 at 23 13 15

so it seems that 928533 deepcopies are performed at import time, which can very likely be optimised.

update: a naive "solution" to the problem is to remove all calls to copy.deepcopy in unyt.unit_object.Unit.__deepcopy__ Of course this isn't viable, but I report that it saves 600ms on import time on my machine, which is more than double the import time of sympy and numpy combined. so the most problematic lines are https://github.com/yt-project/unyt/blob/aaa2b244207c14c5f472214595a00201d105c0ae/unyt/unit_object.py#L518#L519

neutrinoceros avatar May 01 '21 21:05 neutrinoceros

unyt.physical_constants does seems to be the culprit for some incredibly slow import times I'm coming across (python -X importtime -c "import unyt" 2> tuna.log && tuna tuna.log)

image

mattwthompson avatar Jul 29 '21 21:07 mattwthompson

This seems to not be the case anymore with unyt 2.9.0, now sympy is the most important offender (takes between 25 and 40% of unyt's import time on my machine), so there's actually not much room left to optimise unyt's import time in unyt itself. One minor optimisation that (I think) is still doable would be to delay importing matplotlib even when it's available. I will experiment with this idea and possibly propose we close this ticket.

neutrinoceros avatar Jul 16 '22 08:07 neutrinoceros

With #250, #251 and #255 I think we should be able to close this

neutrinoceros avatar Jul 16 '22 09:07 neutrinoceros

For details here's the state of things when combining the above optimisations Screenshot 2022-07-19 at 10 39 15

Here, sympy + numpy represent at least 50 to 60% of unyt's import time. Maybe there are still ways to reduce the remaining overhead, but it's hard to even profile correctly: Tuna doesn't display much detail for unit_symbols.py because not everything makes it into the profile data (by CPython's design), but I think that there is also a large fraction of it due to sympy.

neutrinoceros avatar Jul 19 '22 08:07 neutrinoceros

Found another simple optimisation, this time requiring a discussion upstream https://github.com/sympy/sympy/pull/23804

neutrinoceros avatar Jul 19 '22 21:07 neutrinoceros