ACT Speed up import time to load ACT

I think we need to pay more attention to the way we load ACT modules. Currently it takes about 8 seconds to load ACT. Also it would be better to not load modules that are not used. For example the cartopy module is loaded for any plot created even if it is not used. I think we should put that module import under the GeographicPlotDisplay class so it is only imported when that class is loaded.

Not sure how this is typically done so if someone has suggestions I'm happy to implement.

May 17 '19 19:05 kenkehoe

@rcjackson Is moving the imports into the classes a good way to get around the import time?

May 23 '19 19:05 AdamTheisen

It sounds reasonable to me

May 23 '19 19:05 AdamTheisen

Once you import the modules once they don't load again until you restart. So it won't really matter if we put them in the classes or not, I suspect we will see the same performance either way.

From: Adam Theisen [email protected] Sent: Thursday, May 23, 2019 2:49:59 PM To: ANL-DIGR/ACT Cc: Jackson, Robert; Mention Subject: Re: [ANL-DIGR/ACT] Speed up import time to load ACT (#62)

@rcjacksonhttps://github.com/rcjackson Is moving the imports into the classes a good way to get around the import time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ANL-DIGR/ACT/issues/62?email_source=notifications&email_token=AFIQA5DDD55ZUR65ZUIDG4LPW3YOPA5CNFSM4HNX35PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWDJUGQ#issuecomment-495360538, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFIQA5CSCHJJRGWD3ACRFPLPW3YOPANCNFSM4HNX35PA.

May 23 '19 20:05 rcjackson

OK. I am having a bit of issues with using ACT on my local machine. I can't get cartopy installed locally. Since the module is not used with time series plotting I can't do that. Should we work on only requiring the modules that get used so the repo is useable by everyone?

Jun 07 '19 18:06 kenkehoe

@kenkehoe If you can move the imports around and make it work for yourself without causing functionality issues, please go ahead.

Jun 27 '19 16:06 AdamTheisen

I'm doing some testing and will just post some results in here.

First import try was 7.9 seconds, after that the average was about 1.4 seconds

The single largest time sync seemed to be importing mpcalc from metpy which adds about 0.4 seconds to the import

Jun 28 '19 17:06 AdamTheisen

I'll run through the code and see if I can find some changes, but some of the imports you only need to import certain functions, so instead of: import xarray as xr,

if your only using open_mfdataset: from xarray import open_mfdataset

Importing just needed functions might possibly speed up the code, and also in the init's there is full module imports, like in the io init.py there is: from . import armfiles this can be changed to: from .armfiles import read_netcdf etc.

Jun 28 '19 18:06 zssherman

Actually nvm on the first half, after more research and local test it seems that import specific functions won't really effect speed...

Jun 28 '19 18:06 zssherman

import time: self [us] | cumulative | imported package import time: 1975 | 119753 | act.plotting.SkewTDisplay import time: 2569 | 2569 | cartopy.io.img_tiles import time: 2082 | 4650 | act.plotting.XSectionDisplay import time: 2155 | 2155 | act.plotting.GeoDisplay import time: 2125 | 2125 | act.plotting.HistogramDisplay import time: 2573 | 1217256 | act.plotting import time: 2528 | 2528 | act.corrections.common import time: 1918 | 1918 | act.corrections.ceil import time: 1975 | 1975 | act.corrections.mpl import time: 3180 | 9599 | act.corrections import time: 2528 | 2528 | act.tests.sample_files import time: 2928 | 5455 | act.tests import time: 2473 | 2473 | act.discovery.get_armfiles import time: 2965 | 5437 | act.discovery import time: 237 | 237 | metpy import time: 57 | 293 | metpy.calc import time: 2350 | 2642 | act.retrievals.stability_indices import time: 2986 | 5627 | act.retrievals import time: 2182 | 2182 | act._version import time: 242112 | 3633517 | act

Might help narrow it down.

Jun 28 '19 18:06 zssherman

@kenkehoe it seems to me that this is importing in an ok amount of time (2-4 seconds). Do you still see issues at all?

Jun 17 '21 16:06 AdamTheisen

It's OK. I'll close it.

Jun 17 '21 16:06 kenkehoe

@zssherman we've been asked to look into this issue again as the import times may be causing issues with the DQ processing.

Jan 10 '22 22:01 AdamTheisen

@AdamTheisen Gotcha, what were we using previous to check import times for act?

Jan 10 '22 22:01 zssherman

@zssherman I think you ran some tests in the past but I was using the newer python -X importtime script.py. @kenkehoe was going to help and run this on their systems/environments.

Jan 10 '22 22:01 AdamTheisen

@AdamTheisen Gotcha, yeah if he runs it on their system, I can get a better idea of where the hold up is in the code.

Jan 10 '22 23:01 zssherman

System | Self Time [us] | Cumulative Time [us]

procnode1-dev (i.e. mercury) | 700,725 | 5,145,136 emerald | 1,470,465 | 11,659,648 emerald (using local disk only) | 687,683 | 2,826,976 dq1 (i.e. gold) | 1,186 | 2,335,415 my Mac laptop | 129,503 | 4,654,856

Jan 11 '22 00:01 kenkehoe

Thanks @kenkehoe ! That's an interesting range of times. Do you have the full output at all? That will help narrow down the areas taking the longest

Jan 11 '22 02:01 AdamTheisen

Ummm, yeah I can get the full output too. It's very long. I was trying to think of ways to summarize it. I'll look into it today.

Jan 11 '22 14:01 kenkehoe

Here is profile how long each of the imports takes

Currently, we import everything at the top level:

import act

The only way to speed things up would be to change the api a bit, essentially requiring the user to import what they need. For example, if someone is doing io/corrections, they would do

import act.corrections
import act.io

or

from act.io import read_netcdf
from act.corrections import correct_ceil

For MetPy, at the top level, they import the basics (essentially just the xarray accessor), and require the user to import the rest when they are working with different parts of the package. For example:

from metpy.cbook import get_test_data
from metpy.plots import MapPanel, PanelContainer

Perhaps we could discuss this more in detail at the next ACT meeting. This would be a pretty big change, but it is the only way to get around the import time issue. There aren't any classes, moving imports, etc. we can change to magically fix this - there is a tradeoff between importing everything in one line, and importing things as the user needs them.

Jun 10 '22 13:06 mgrover1

This definitely needs more discussion but it might be worth it in the long run if we expect ACT to continue to grow in size. Or we work on excluding the test files and data first and then go from there. Definitely open, but we would want to run this by other users. I would be curious what @maxwellevin would have to say with the other projects that use ACT and if the import time is an issue.

Jun 10 '22 13:06 AdamTheisen

I agree this is worth discussing and testing. Moving the testing files out of the installed repo and improving the speed would be worth the effort to move ACT inline with other highly used and regarded repos. Now is our chance to make these changes that could affect usage. Making the change in the future will just be more difficult. Would it be possible to go the numpy route with both options working but issuing warnings for the full import at the start and then transitioning to not importing everything at the start in a few versions to give people time to make changes now?

Jun 10 '22 15:06 kenkehoe

@AdamTheisen for tsdat we haven't thought much about optimizing module loading, but we definitely will at some point. I don't know much about how to improve it outside of what has already been suggested here, so I am definitely interested in learning more about this. For kicks I ran the same thing as @mgrover1 (I think?) to generate tsdat's import profile and it looks pretty evenly split between xarray and act. I'm certain there are things we could be doing from the tsdat side of things to mitigate this, I just don't know how yet.

code

pip install tsdat tuna
python -X importtime -c "import tsdat" 2 > import.log
tuna import.log

I like @mgrover1's suggestion of changing how modules are loaded to improve performance in most cases. The downside, as already mentioned, is that the API does change quite a bit since you now have to manually import the submodules you need. That can be a bit annoying if your workflow involves using all the act submodules, but I think most people only use one or two immediately.

For tsdat we are currently only actually using the act.utils module directly, but we spend a good chunk of time importing the act.discovery and act.plotting modules. I've seen the split submodule approach in other libraries too – rich and scipy come to mind for me, but I'm sure there's more.

Jun 10 '22 17:06 maxwelllevin

@AdamTheisen @kenkehoe At SciPy last week, the Scientific Python group mentioned they have a list of SPECs for the community - including one on lazy loading of submodules (which is what we are aiming for here), speeding up import times.

Here is the SPEC - https://scientific-python.org/specs/spec-0001/

It is worth looking into this strategy + method to see if we can use it here.

Jul 18 '22 15:07 mgrover1

@mgrover1 this looks like a good option. Would take some initial effort up front to implement but could be worth it

Jul 18 '22 20:07 AdamTheisen

ACT ACT copied to clipboard

Speed up import time to load ACT

System | Self Time [us] | Cumulative Time [us]

ACT
ACT copied to clipboard