xraylarch icon indicating copy to clipboard operation
xraylarch copied to clipboard

Speed-up larch/xas_viewer at start

Open maurov opened this issue 2 years ago • 3 comments

@newville I get many reports about the fact that xas_viewer or the larch shell are "very slow" at starting-up. I have this feeling too, and I would like to investigate this more. Is there a real need for checking the installation and possibility to upgrade at each runtime?

maurov avatar Jun 09 '22 09:06 maurov

@maurov For sure, on "first run" it may be checking for and then pip-installing more packages. After that, it checks locally for some packages but that itself is quick.

For XAS Viewer, the "check if update is available" happens in a separate thread, and should not delay using the windows. This thread fetches a URL from github.com (so should not be worse in Grenoble or Sao Paulo or Tsukuba than Chicago) with a timeout of 3.1 seconds. For me, it returns instantly, including if I turn off wifi on my laptop.

But: I think the slowness could be studied more. On a new-ish laptop, I currently get

>>> import time ; t0 = time.time() ; import larch ; print(time.time() - t0)
1.8603730201721191

Of course, I am totally willing to say that is the "best case" . In more detail, with a script of

import time
import sys
import copy
import importlib

start = time.time()

modules = ('numpy', 'scipy.optimize', 'pandas', 'matplotlib', 'wx',
           'urllib3.response', 'h5py', 'hdf5plugin', 'sklearn', 'silx',
           'pymatgen', 'pyFAI', 'tomopy', 'sqlalchemy', 'ast', 'asteval',
           'lmfit', 'matplotlib.backends.backend_wxagg',
           'epics', 'larch')

allmods = {}
for m in modules:
    __import__(m)
    allmods[m] = copy.copy(sys.modules)
    print(f"module {m}: {time.time()-start:.4f}, {len(sys.modules)}")

#with open('larch_only_modules.txt', 'w') as fh:
#    for i in allmods['larch']:
#        if i not in allmods['epics']:
#            fh.write(f"{i}\n")
#    fh.write('\n')

I get accumulated times of:

module numpy: 0.3067, 226 module scipy.optimize: 0.4008, 473 module pandas: 0.5500, 819 module wx: 0.5887, 824 module urllib3.response: 0.6156, 887 module h5py: 0.6435, 929 module hdf5plugin: 0.6509, 933 module sklearn: 0.7899, 1136 module silx: 0.7904, 1139 module pymatgen: 0.7906, 1140 module pyFAI: 0.8842, 1257 module tomopy: 0.9440, 1440 module sqlalchemy: 1.0036, 1545 module ast: 1.0037, 1545 module asteval: 1.0047, 1549 module lmfit: 1.1119, 1636 module matplotlib: 1.1119, 1636 module matplotlib.backends.backend_wxagg: 1.4261, 1707 module epics: 1.4290, 1717 module larch: 1.8324, 2414

So, yeah larch is adding to that, and it could be looked into.....

newville avatar Jun 09 '22 12:06 newville

But: I think the slowness could be studied more. On a new-ish laptop, I currently get

>>> import time ; t0 = time.time() ; import larch ; print(time.time() - t0)
1.8603730201721191

Well, this is also what I get on the beamline machine, which is a rather powerful computer. Almost 2 seconds to load the larch shell on such hardware seems a lot to me. In fact, when I start larch on a virtual machine on the ESRF cluster, this goes up to 10-20 seconds.

Is there a need to load everything at startup? We could load modules only when needed, right?

Anyway, that's not top priority for the moment.

maurov avatar Jun 09 '22 15:06 maurov

@maurov I'm not sure this is really a speedup for launching XAS_Viewer (or even larch CLI), but I did some cleanup of imports on initialization, so that

>>> import time ; t0 = time.time() ; import larch ; print(time.time() - t0)

takes about half the time, and the more complete script gives:

module numpy: 0.3068, 226
module scipy.optimize: 0.4053, 473
module pandas: 0.5509, 819
module matplotlib: 0.6316, 878
module wx: 0.6814, 883
module urllib3.response: 0.7080, 946
module h5py: 0.7339, 988
module hdf5plugin: 0.7415, 992
module sklearn: 0.8766, 1193
module silx:  0.8771, 1196
module pymatgen: 0.8772, 1197
module pyFAI: 0.9642, 1301
module tomopy: 1.0259, 1484
module sqlalchemy: 1.1050, 1589
module ast: 1.1051, 1589
module asteval: 1.1059, 1593
module lmfit: 1.1207, 1636
module matplotlib.backends.backend_wxagg: 1.4339, 1707
module epics: 1.4368, 1717
module larch: 1.4773, 1821

So, I conclude that many packages take ~0.1 seconds to import, some (numpy, pandas, sklearn, mpl...backend_wxagg) take a bit more. We import a lot of packages, and I don't think there is one or two "ah, there is the main problem".

I sort of doubt that will really speed up the launching of XAS Viewer by very much.

But for sure, total startup times of more than 10 seconds are definitely annoying and worth trying to figure out. That could be one package (say matplotlib.backends.backend_wxagg or some sort of networking package) really dominating the time.

newville avatar Jun 16 '22 16:06 newville