spinalcordtoolbox
spinalcordtoolbox copied to clipboard
Slow imports
This is a non-urgent issue that I noticed during #4345 -- running sct_deepseg -h
takes a noticeably long time.
I'm curious as to why this is, and I would love to try to reduce the "startup" time for scripts if I can.
My first guess (but it's just a guess, we should profile it) would be that one or more of our import
statements is heavy; probably one or more of the AI/ML frameworks we use. If that's the case, it should be easy to move the offending import statements inside the functions which actually use them.
Yeah. I think ivadomed plays a big role in the slow imports. Which is partly the reason we would like to remove that dependency.
I renamed this issue because it concerns most SCT functions. Eg: it takes 15s (!) to run help on sct_maths
:
Terminal output
julien-macbook:~ $ time sct_maths
--
Spinal Cord Toolbox (git-jca/4298-flow-d396ad3a9090900cc62d4b6eb4ccbc6d63d4a6eb)
sct_maths
--
usage: sct_maths -i <file> -o <file> [-h] [-add [...]] [-sub [...]] [-mul [...]] [-div [...]]
[-mean {x,y,z,t}] [-rms {x,y,z,t}] [-std {x,y,z,t}] [-bin <float>] [-otsu <int>]
[-adap <list>] [-otsu-median <list>] [-percent <int>] [-thr <float>]
[-uthr <float>] [-dilate <int>] [-erode <int>] [-shape {square,cube,disk,ball}]
[-dim {0,1,2}] [-smooth <list>] [-laplacian <list>] [-denoise DENOISE] [-mi <file>]
[-minorm <file>] [-corr <file>] [-symmetrize {0,1,2}]
[-type {uint8,int16,int32,float32,complex64,float64,int8,uint16,uint32,int64,uint64}]
[-v <int>]
Perform mathematical operations on images.
MANDATORY ARGUMENTS:
-i <file> Input file. Example: data.nii.gz
-o <file> Output file. Example: data_mean.nii.gz
OPTIONAL ARGUMENTS:
-h, --help Show this help message and exit
-v <int> Verbosity. 0: Display only errors/warnings, 1: Errors/warnings + info
messages, 2: Debug mode (default: 1)
BASIC OPERATIONS:
-add [ ...] Add following input. Can be a number or one or more 3D/4D images (separated
with space). Examples:
- sct_maths -i 3D.nii.gz -add 5 (Result: 3D image
with "5" added to each voxel)
- sct_maths -i 3D.nii.gz -add 3D_2.nii.gz (Result: 3D image)
- sct_maths -i 4D.nii.gz -add 4D_2.nii.gz (Result: 4D image)
- sct_maths -i 4D_nii.gz -add 4D_2.nii.gz 4D_3.nii.gz (Result: 4D image)
Note: If your terminal supports it, you can also specify multiple images
using a pattern:
- sct_maths -i 4D.nii.gz -add 4D_*.nii.gz (Result: Adding 4D_2.nii.gz,
4D_3.nii.gz, etc.)
Note: If the input image is 4D, you can also leave "-add" empty to sum the
3D volumes within the image:
- sct_maths -i 4D.nii.gz -add (Result: 3D image, with 3D
volumes summed within 4D image)
-sub [ ...] Subtract following input. Can be a number, or one or more 3D/4D images
(separated with space).
-mul [ ...] Multiply by following input. Can be a number, or one or more 3D/4D images
(separated with space). (See -add for examples.)
-div [ ...] Divide by following input. Can be a number, or one or more 3D/4D images
(separated with space).
-mean {x,y,z,t} Average data across dimension.
-rms {x,y,z,t} Compute root-mean-squared across dimension.
-std {x,y,z,t} Compute STD across dimension.
-bin <float> Binarize image using specified threshold. Example: 0.5
THRESHOLDING METHODS:
-otsu <int> Threshold image using Otsu algorithm (from skimage). Specify the number of
bins (e.g. 16, 64, 128)
-adap <list> Threshold image using Adaptive algorithm (from skimage). Provide 2 values
separated by ',' that correspond to the parameters below. For example,
'-adap 7,0' corresponds to a block size of 7 and an offset of 0.
- Block size: Odd size of pixel neighborhood which is used to calculate
the threshold value.
- Offset: Constant subtracted from weighted mean of neighborhood to
calculate the local threshold value. Suggested offset is 0.
-otsu-median <list> Threshold image using Median Otsu algorithm (from dipy). Provide 2 values
separated by ',' that correspond to the parameters below. For example,
'-otsu-median 3,5' corresponds to a filter size of 3 repeated over 5
iterations.
- Size: Radius (in voxels) of the applied median filter.
- Iterations: Number of passes of the median filter.
-percent <int> Threshold image using percentile of its histogram.
-thr <float> Lower threshold limit (zero below number).
-uthr <float> Upper threshold limit (zero above number).
MATHEMATICAL MORPHOLOGY:
-dilate <int> Dilate binary or greyscale image with specified size. If shape={'square',
'cube'}: size corresponds to the length of an edge (size=1 has no effect).
If shape={'disk', 'ball'}: size corresponds to the radius, not including the
center element (size=0 has no effect).
-erode <int> Erode binary or greyscale image with specified size. If shape={'square',
'cube'}: size corresponds to the length of an edge (size=1 has no effect).
If shape={'disk', 'ball'}: size corresponds to the radius, not including the
center element (size=0 has no effect).
-shape {square,cube,disk,ball}
Shape of the structuring element for the mathematical morphology operation.
Default: ball.
If a 2D shape {'disk', 'square'} is selected, -dim must be specified.
(default: ball)
-dim {0,1,2} Dimension of the array which 2D structural element will be orthogonal to.
For example, if you wish to apply a 2D disk kernel in the X-Y plane, leaving
Z unaffected, parameters will be: shape=disk, dim=2.
FILTERING METHODS:
-smooth <list> Gaussian smoothing filtering. Supply values for standard deviations in mm.
If a single value is provided, it will be applied to each axis of the image.
If multiple values are provided, there must be one value per image axis.
(Examples: "-smooth 2.0,3.0,2.0" (3D image), "-smooth 2.0" (any-D image)).
-laplacian <list> Laplacian filtering. Supply values for standard deviations in mm. If a
single value is provided, it will be applied to each axis of the image. If
multiple values are provided, there must be one value per image axis.
(Examples: "-laplacian 2.0,3.0,2.0" (3D image), "-laplacian 2.0" (any-D
image)).
-denoise DENOISE Non-local means adaptative denoising from P. Coupe et al. as implemented in
dipy. Separate with ". Example: p=1,b=3
p: (patch radius) similar patches in the non-local means are searched for
locally, inside a cube of side 2*p+1 centered at each voxel of interest.
Default: p=1
b: (block radius) the size of the block to be used (2*b+1) in the blockwise
non-local means implementation. Default: b=5 Note, block radius must be
smaller than the smaller image dimension: default value is lowered for
small images)
To use default parameters, write -denoise 1
SIMILARITY METRIC:
-mi <file> Compute the mutual information (MI) between both input files (-i and -mi) as
in: http://scikit-
learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html
-minorm <file> Compute the normalized mutual information (MI) between both input files (-i
and -mi) as in: http://scikit-learn.org/stable/modules/generated/sklearn.met
rics.normalized_mutual_info_score.html
-corr <file> Compute the cross correlation (CC) between both input files (-i and -cc).
MISC:
-symmetrize {0,1,2} Symmetrize data along the specified dimension.
-type {uint8,int16,int32,float32,complex64,float64,int8,uint16,uint32,int64,uint64}
Output type.
sct_maths: error: the following arguments are required: -i, -o
real 0m14.982s
user 0m2.961s
sys 0m1.111s
Note that re-running it a few mins later is much faster (presumably because the OS keeps the imports in memory, or something like that...?):
Terminal output
real 0m1.924s
user 0m2.403s
sys 0m0.687s
One thing to note is that we currently have a test that runs the help on all scripts:
https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/6962e03b5906ab3466e6e330438dbea58d949407/testing/cli/test_cli.py#L19-L22
We could repurpose this test to also check slow imports by clearing sys.modules
at the start of each test per script. This can be used to profile timing by passing, say, --durations=10 --durations-min=1.0
(or a variation) to pytest.
Then, if we end up getting all of the slowest -h
imports under a certain goal (e.g. 5s), we can preserve this maximum limit by using pytest.mark.timeout()
to fail a test if it crosses the threshold we've set for ourselves.
(Though, this may be tricky to enforce due to natural variance in OSs and runners. But, I imagine there are lots of solutions out there for handling slow tests!)
Quick and easy profiling:
pip install tuna
PYTHONPROFILEIMPORTTIME=1 sct_deepseg -h 2> import.log
tuna import.log
The biggest offenders seem to be ivadomed
, torch
, torchvision
, wandb
, and monai
.
Digression about PEP 690 - Lazy Imports
In terms of actually implementing lazy loading, I had the question: "Python style guides generally recommend keeping imports at the top of each module. Isn't lazy-loading incompatible with this? What is the "Pythonic" way to reduce import times?"
It turns out that PEP 690 – Lazy Imports (as well as its rejection) explicitly references this dilemma:
Common Python code style prefers imports at module level, so they don’t have to be repeated within each scope the imported object is used in, and to avoid the inefficiency of repeated execution of the import system at runtime. This means that importing the main module of a program typically results in an immediate cascade of imports of most or all of the modules that may ever be needed by the program.
Consider the example of a Python command line program (CLI) with a number of subcommands. Each subcommand may perform different tasks, requiring the import of different dependencies. But a given invocation of the program will only execute a single subcommand, or possibly none (i.e. if just --help usage info is requested). Top-level eager imports in such a program will result in the import of many modules that will never be used at all; the time spent (possibly compiling and) executing these modules is pure waste.
I couldn't have written a more accurate summary of our current issue.
The reason I bring this up is that I imagine that this PEP (and the associated official discussion + other discussions) are perfect places to learn about alternative lazy-loading approaches.
And, indeed, it mentions:
The Python standard library already includes built-in support for lazy imports, via importlib.util.LazyLoader.
The importlib
documentation referenced by PEP 690 explicitly includes an Implementing Lazy Imports section, with further discussion in https://stackoverflow.com/q/77319516.
I tried focusing just on sct_deepseg -h
, and I was able to cut the time down dramatically using lazy_import
in just a few places (see jn/4370-fix-slow-imports
for the changes I made):
From ~8.5s to under a second! (And, I could have kept going with nibabel
, etc. to save even more time)
ah! super interesting and encouraging, given that ivadomed
is on the way out, and wandb
... well... there is no reason to have wandb
here 😅 (it's only used for monitoring model training)