ACT
ACT copied to clipboard
Speed up import time to load ACT
I think we need to pay more attention to the way we load ACT modules. Currently it takes about 8 seconds to load ACT. Also it would be better to not load modules that are not used. For example the cartopy module is loaded for any plot created even if it is not used. I think we should put that module import under the GeographicPlotDisplay class so it is only imported when that class is loaded.
Not sure how this is typically done so if someone has suggestions I'm happy to implement.
@rcjackson Is moving the imports into the classes a good way to get around the import time?
It sounds reasonable to me
Once you import the modules once they don't load again until you restart. So it won't really matter if we put them in the classes or not, I suspect we will see the same performance either way.
From: Adam Theisen [email protected] Sent: Thursday, May 23, 2019 2:49:59 PM To: ANL-DIGR/ACT Cc: Jackson, Robert; Mention Subject: Re: [ANL-DIGR/ACT] Speed up import time to load ACT (#62)
@rcjacksonhttps://github.com/rcjackson Is moving the imports into the classes a good way to get around the import time?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ANL-DIGR/ACT/issues/62?email_source=notifications&email_token=AFIQA5DDD55ZUR65ZUIDG4LPW3YOPA5CNFSM4HNX35PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWDJUGQ#issuecomment-495360538, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFIQA5CSCHJJRGWD3ACRFPLPW3YOPANCNFSM4HNX35PA.
OK. I am having a bit of issues with using ACT on my local machine. I can't get cartopy installed locally. Since the module is not used with time series plotting I can't do that. Should we work on only requiring the modules that get used so the repo is useable by everyone?
@kenkehoe If you can move the imports around and make it work for yourself without causing functionality issues, please go ahead.
I'm doing some testing and will just post some results in here.
First import try was 7.9 seconds, after that the average was about 1.4 seconds
The single largest time sync seemed to be importing mpcalc from metpy which adds about 0.4 seconds to the import
I'll run through the code and see if I can find some changes, but some of the imports you only need to import certain functions, so instead of: import xarray as xr,
if your only using open_mfdataset: from xarray import open_mfdataset
Importing just needed functions might possibly speed up the code, and also in the init's there is full module imports, like in the io init.py there is: from . import armfiles this can be changed to: from .armfiles import read_netcdf etc.
Actually nvm on the first half, after more research and local test it seems that import specific functions won't really effect speed...
import time: self [us] | cumulative | imported package import time: 1975 | 119753 | act.plotting.SkewTDisplay import time: 2569 | 2569 | cartopy.io.img_tiles import time: 2082 | 4650 | act.plotting.XSectionDisplay import time: 2155 | 2155 | act.plotting.GeoDisplay import time: 2125 | 2125 | act.plotting.HistogramDisplay import time: 2573 | 1217256 | act.plotting import time: 2528 | 2528 | act.corrections.common import time: 1918 | 1918 | act.corrections.ceil import time: 1975 | 1975 | act.corrections.mpl import time: 3180 | 9599 | act.corrections import time: 2528 | 2528 | act.tests.sample_files import time: 2928 | 5455 | act.tests import time: 2473 | 2473 | act.discovery.get_armfiles import time: 2965 | 5437 | act.discovery import time: 237 | 237 | metpy import time: 57 | 293 | metpy.calc import time: 2350 | 2642 | act.retrievals.stability_indices import time: 2986 | 5627 | act.retrievals import time: 2182 | 2182 | act._version import time: 242112 | 3633517 | act
Might help narrow it down.
@kenkehoe it seems to me that this is importing in an ok amount of time (2-4 seconds). Do you still see issues at all?
It's OK. I'll close it.
@zssherman we've been asked to look into this issue again as the import times may be causing issues with the DQ processing.
@AdamTheisen Gotcha, what were we using previous to check import times for act?
@zssherman I think you ran some tests in the past but I was using the newer python -X importtime script.py. @kenkehoe was going to help and run this on their systems/environments.
@AdamTheisen Gotcha, yeah if he runs it on their system, I can get a better idea of where the hold up is in the code.
System | Self Time [us] | Cumulative Time [us]
procnode1-dev (i.e. mercury) | 700,725 | 5,145,136 emerald | 1,470,465 | 11,659,648 emerald (using local disk only) | 687,683 | 2,826,976 dq1 (i.e. gold) | 1,186 | 2,335,415 my Mac laptop | 129,503 | 4,654,856
Thanks @kenkehoe ! That's an interesting range of times. Do you have the full output at all? That will help narrow down the areas taking the longest
Ummm, yeah I can get the full output too. It's very long. I was trying to think of ways to summarize it. I'll look into it today.
Here is profile how long each of the imports takes

Currently, we import everything at the top level:
import act
The only way to speed things up would be to change the api a bit, essentially requiring the user to import what they need. For example, if someone is doing io/corrections, they would do
import act.corrections
import act.io
or
from act.io import read_netcdf
from act.corrections import correct_ceil
For MetPy, at the top level, they import the basics (essentially just the xarray accessor), and require the user to import the rest when they are working with different parts of the package. For example:
from metpy.cbook import get_test_data
from metpy.plots import MapPanel, PanelContainer
Perhaps we could discuss this more in detail at the next ACT meeting. This would be a pretty big change, but it is the only way to get around the import time issue. There aren't any classes, moving imports, etc. we can change to magically fix this - there is a tradeoff between importing everything in one line, and importing things as the user needs them.
This definitely needs more discussion but it might be worth it in the long run if we expect ACT to continue to grow in size. Or we work on excluding the test files and data first and then go from there. Definitely open, but we would want to run this by other users. I would be curious what @maxwellevin would have to say with the other projects that use ACT and if the import time is an issue.
I agree this is worth discussing and testing. Moving the testing files out of the installed repo and improving the speed would be worth the effort to move ACT inline with other highly used and regarded repos. Now is our chance to make these changes that could affect usage. Making the change in the future will just be more difficult. Would it be possible to go the numpy route with both options working but issuing warnings for the full import at the start and then transitioning to not importing everything at the start in a few versions to give people time to make changes now?
@AdamTheisen for tsdat
we haven't thought much about optimizing module loading, but we definitely will at some point. I don't know much about how to improve it outside of what has already been suggested here, so I am definitely interested in learning more about this. For kicks I ran the same thing as @mgrover1 (I think?) to generate tsdat's import profile and it looks pretty evenly split between xarray
and act
. I'm certain there are things we could be doing from the tsdat
side of things to mitigate this, I just don't know how yet.
code
pip install tsdat tuna
python -X importtime -c "import tsdat" 2 > import.log
tuna import.log

I like @mgrover1's suggestion of changing how modules are loaded to improve performance in most cases. The downside, as already mentioned, is that the API does change quite a bit since you now have to manually import the submodules you need. That can be a bit annoying if your workflow involves using all the act
submodules, but I think most people only use one or two immediately.
For tsdat
we are currently only actually using the act.utils
module directly, but we spend a good chunk of time importing the act.discovery
and act.plotting
modules. I've seen the split submodule approach in other libraries too – rich
and scipy
come to mind for me, but I'm sure there's more.
@AdamTheisen @kenkehoe At SciPy last week, the Scientific Python group mentioned they have a list of SPECs for the community - including one on lazy loading of submodules (which is what we are aiming for here), speeding up import times.
Here is the SPEC - https://scientific-python.org/specs/spec-0001/
It is worth looking into this strategy + method to see if we can use it here.
@mgrover1 this looks like a good option. Would take some initial effort up front to implement but could be worth it