spinalcordtoolbox Slow imports

This is a non-urgent issue that I noticed during #4345 -- running sct_deepseg -h takes a noticeably long time.

I'm curious as to why this is, and I would love to try to reduce the "startup" time for scripts if I can.

Feb 14 '24 17:02 joshuacwnewton

My first guess (but it's just a guess, we should profile it) would be that one or more of our import statements is heavy; probably one or more of the AI/ML frameworks we use. If that's the case, it should be easy to move the offending import statements inside the functions which actually use them.

Feb 14 '24 19:02 mguaypaq

Yeah. I think ivadomed plays a big role in the slow imports. Which is partly the reason we would like to remove that dependency.

Feb 14 '24 19:02 jcohenadad

I renamed this issue because it concerns most SCT functions. Eg: it takes 15s (!) to run help on sct_maths:

Terminal output

julien-macbook:~ $ time sct_maths

--
Spinal Cord Toolbox (git-jca/4298-flow-d396ad3a9090900cc62d4b6eb4ccbc6d63d4a6eb)

sct_maths 
--

usage: sct_maths -i <file> -o <file> [-h] [-add [...]] [-sub  [...]] [-mul [...]] [-div  [...]]
                 [-mean {x,y,z,t}] [-rms {x,y,z,t}] [-std {x,y,z,t}] [-bin <float>] [-otsu <int>]
                 [-adap <list>] [-otsu-median <list>] [-percent <int>] [-thr <float>]
                 [-uthr <float>] [-dilate <int>] [-erode <int>] [-shape {square,cube,disk,ball}]
                 [-dim {0,1,2}] [-smooth <list>] [-laplacian <list>] [-denoise DENOISE] [-mi <file>]
                 [-minorm <file>] [-corr <file>] [-symmetrize {0,1,2}]
                 [-type {uint8,int16,int32,float32,complex64,float64,int8,uint16,uint32,int64,uint64}]
                 [-v <int>]

Perform mathematical operations on images.

MANDATORY ARGUMENTS:
  -i <file>             Input file. Example: data.nii.gz
  -o <file>             Output file. Example: data_mean.nii.gz

OPTIONAL ARGUMENTS:
  -h, --help            Show this help message and exit
  -v <int>              Verbosity. 0: Display only errors/warnings, 1: Errors/warnings + info
                        messages, 2: Debug mode (default: 1)

BASIC OPERATIONS:
  -add [ ...]           Add following input. Can be a number or one or more 3D/4D images (separated
                        with space). Examples:
                          - sct_maths -i 3D.nii.gz -add 5                       (Result: 3D image
                            with "5" added to each voxel)
                          - sct_maths -i 3D.nii.gz -add 3D_2.nii.gz             (Result: 3D image)
                          - sct_maths -i 4D.nii.gz -add 4D_2.nii.gz             (Result: 4D image)
                          - sct_maths -i 4D_nii.gz -add 4D_2.nii.gz 4D_3.nii.gz (Result: 4D image)
                        Note: If your terminal supports it, you can also specify multiple images
                        using a pattern:
                          - sct_maths -i 4D.nii.gz -add 4D_*.nii.gz (Result: Adding 4D_2.nii.gz,
                            4D_3.nii.gz, etc.)
                        Note: If the input image is 4D, you can also leave "-add" empty to sum the
                        3D volumes within the image:
                          - sct_maths -i 4D.nii.gz -add             (Result: 3D image, with 3D
                            volumes summed within 4D image)
  -sub  [ ...]          Subtract following input. Can be a number, or one or more 3D/4D images
                        (separated with space).
  -mul [ ...]           Multiply by following input. Can be a number, or one or more 3D/4D images
                        (separated with space). (See -add for examples.)
  -div  [ ...]          Divide by following input. Can be a number, or one or more 3D/4D images
                        (separated with space).
  -mean {x,y,z,t}       Average data across dimension.
  -rms {x,y,z,t}        Compute root-mean-squared across dimension.
  -std {x,y,z,t}        Compute STD across dimension.
  -bin <float>          Binarize image using specified threshold. Example: 0.5

THRESHOLDING METHODS:
  -otsu <int>           Threshold image using Otsu algorithm (from skimage). Specify the number of
                        bins (e.g. 16, 64, 128)
  -adap <list>          Threshold image using Adaptive algorithm (from skimage). Provide 2 values
                        separated by ',' that correspond to the parameters below. For example,
                        '-adap 7,0' corresponds to a block size of 7 and an offset of 0.
                          - Block size: Odd size of pixel neighborhood which is used to calculate
                            the threshold value.
                          - Offset: Constant subtracted from weighted mean of neighborhood to
                            calculate the local threshold value. Suggested offset is 0.
  -otsu-median <list>   Threshold image using Median Otsu algorithm (from dipy). Provide 2 values
                        separated by ',' that correspond to the parameters below. For example,
                        '-otsu-median 3,5' corresponds to a filter size of 3 repeated over 5
                        iterations.
                          - Size: Radius (in voxels) of the applied median filter.
                          - Iterations: Number of passes of the median filter.
  -percent <int>        Threshold image using percentile of its histogram.
  -thr <float>          Lower threshold limit (zero below number).
  -uthr <float>         Upper threshold limit (zero above number).

MATHEMATICAL MORPHOLOGY:
  -dilate <int>         Dilate binary or greyscale image with specified size. If shape={'square',
                        'cube'}: size corresponds to the length of an edge (size=1 has no effect).
                        If shape={'disk', 'ball'}: size corresponds to the radius, not including the
                        center element (size=0 has no effect).
  -erode <int>          Erode binary or greyscale image with specified size. If shape={'square',
                        'cube'}: size corresponds to the length of an edge (size=1 has no effect).
                        If shape={'disk', 'ball'}: size corresponds to the radius, not including the
                        center element (size=0 has no effect).
  -shape {square,cube,disk,ball}
                        Shape of the structuring element for the mathematical morphology operation.
                        Default: ball.
                        If a 2D shape {'disk', 'square'} is selected, -dim must be specified.
                        (default: ball)
  -dim {0,1,2}          Dimension of the array which 2D structural element will be orthogonal to.
                        For example, if you wish to apply a 2D disk kernel in the X-Y plane, leaving
                        Z unaffected, parameters will be: shape=disk, dim=2.

FILTERING METHODS:
  -smooth <list>        Gaussian smoothing filtering. Supply values for standard deviations in mm.
                        If a single value is provided, it will be applied to each axis of the image.
                        If multiple values are provided, there must be one value per image axis.
                        (Examples: "-smooth 2.0,3.0,2.0" (3D image), "-smooth 2.0" (any-D image)).
  -laplacian <list>     Laplacian filtering. Supply values for standard deviations in mm. If a
                        single value is provided, it will be applied to each axis of the image. If
                        multiple values are provided, there must be one value per image axis.
                        (Examples: "-laplacian 2.0,3.0,2.0" (3D image), "-laplacian 2.0" (any-D
                        image)).
  -denoise DENOISE      Non-local means adaptative denoising from P. Coupe et al. as implemented in
                        dipy. Separate with ". Example: p=1,b=3
                         p: (patch radius) similar patches in the non-local means are searched for
                         locally, inside a cube of side 2*p+1 centered at each voxel of interest.
                         Default: p=1
                         b: (block radius) the size of the block to be used (2*b+1) in the blockwise
                         non-local means implementation. Default: b=5     Note, block radius must be
                         smaller than the smaller image dimension: default value is lowered for
                         small images)
                        To use default parameters, write -denoise 1

SIMILARITY METRIC:
  -mi <file>            Compute the mutual information (MI) between both input files (-i and -mi) as
                        in: http://scikit-
                        learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html
  -minorm <file>        Compute the normalized mutual information (MI) between both input files (-i
                        and -mi) as in: http://scikit-learn.org/stable/modules/generated/sklearn.met
                        rics.normalized_mutual_info_score.html
  -corr <file>          Compute the cross correlation (CC) between both input files (-i and -cc).

MISC:
  -symmetrize {0,1,2}   Symmetrize data along the specified dimension.
  -type {uint8,int16,int32,float32,complex64,float64,int8,uint16,uint32,int64,uint64}
                        Output type.

sct_maths: error: the following arguments are required: -i, -o


real	0m14.982s
user	0m2.961s
sys	0m1.111s

Note that re-running it a few mins later is much faster (presumably because the OS keeps the imports in memory, or something like that...?):

Terminal output

real	0m1.924s
user	0m2.403s
sys	0m0.687s

Feb 16 '24 18:02 jcohenadad

One thing to note is that we currently have a test that runs the help on all scripts:

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/6962e03b5906ab3466e6e330438dbea58d949407/testing/cli/test_cli.py#L19-L22

We could repurpose this test to also check slow imports by clearing sys.modules at the start of each test per script. This can be used to profile timing by passing, say, --durations=10 --durations-min=1.0 (or a variation) to pytest.

Then, if we end up getting all of the slowest -h imports under a certain goal (e.g. 5s), we can preserve this maximum limit by using pytest.mark.timeout() to fail a test if it crosses the threshold we've set for ourselves.

(Though, this may be tricky to enforce due to natural variance in OSs and runners. But, I imagine there are lots of solutions out there for handling slow tests!)

Feb 16 '24 19:02 joshuacwnewton

Quick and easy profiling:

pip install tuna
PYTHONPROFILEIMPORTTIME=1 sct_deepseg -h 2> import.log
tuna import.log

The biggest offenders seem to be ivadomed, torch, torchvision, wandb, and monai.

Digression about PEP 690 - Lazy Imports

In terms of actually implementing lazy loading, I had the question: "Python style guides generally recommend keeping imports at the top of each module. Isn't lazy-loading incompatible with this? What is the "Pythonic" way to reduce import times?"

It turns out that PEP 690 – Lazy Imports (as well as its rejection) explicitly references this dilemma:

Common Python code style prefers imports at module level, so they don’t have to be repeated within each scope the imported object is used in, and to avoid the inefficiency of repeated execution of the import system at runtime. This means that importing the main module of a program typically results in an immediate cascade of imports of most or all of the modules that may ever be needed by the program.

Consider the example of a Python command line program (CLI) with a number of subcommands. Each subcommand may perform different tasks, requiring the import of different dependencies. But a given invocation of the program will only execute a single subcommand, or possibly none (i.e. if just --help usage info is requested). Top-level eager imports in such a program will result in the import of many modules that will never be used at all; the time spent (possibly compiling and) executing these modules is pure waste.

I couldn't have written a more accurate summary of our current issue.

The reason I bring this up is that I imagine that this PEP (and the associated official discussion + other discussions) are perfect places to learn about alternative lazy-loading approaches.

And, indeed, it mentions:

The Python standard library already includes built-in support for lazy imports, via importlib.util.LazyLoader.

The importlib documentation referenced by PEP 690 explicitly includes an Implementing Lazy Imports section, with further discussion in https://stackoverflow.com/q/77319516.

I tried focusing just on sct_deepseg -h, and I was able to cut the time down dramatically using lazy_import in just a few places (see jn/4370-fix-slow-imports for the changes I made):

From ~8.5s to under a second! (And, I could have kept going with nibabel, etc. to save even more time)

Apr 30 '24 22:04 joshuacwnewton

ah! super interesting and encouraging, given that ivadomed is on the way out, and wandb... well... there is no reason to have wandb here 😅 (it's only used for monitoring model training)

May 01 '24 13:05 jcohenadad

spinalcordtoolbox spinalcordtoolbox copied to clipboard

Slow imports

spinalcordtoolbox
spinalcordtoolbox copied to clipboard