Reduce import time

Open aeisenbarth opened this issue 1 year ago • 1 comments

The time to import spatialdata is significant, and can impact some usage scenarios.

(This issue is only for tracking. I don't consider it priority at the current stage of the project. Once the API and code structure stabilize and become cleaner, it might be worth to introduce some techniques to tackle this.)

Example

I am working on a project that takes about 4-5 seconds to import. Partially this is due to other libraries as well, but the biggest chunk is SpatialData.

$ python -m timeit -n 1 -r 1 -c "import spatialdata"
1 loop, best of 1: 2.28 sec per loop

For comparison, numpy takes 81ms.

Use cases

It may not have a big impact for datascience in a Jupyter notebook where you import once and work all day with the same session.

CLI tools: my-tool --help should respond immediately. This can be mitigated by removing my CLI tool's __main__ top-level imports and use local imports only in those command functions that actually use the imported modules. This way, only when the user issues a command that does processing with SpatialData, it is imported, whereas other commands respond immediately.
Napari plugins: With NPE2, plugins are imported when invoking them in the GUI (for example clicking the menu). When a user invokes a menu item that opens a plugin's widget which uses SpatialData, the user interface remains unresponsive for multiple seconds. For a moment, users think nothing happened.

Workarounds

As a user of SpatialData, we can:

Use if TYPE_CHECKING if we just import datastructures for type hinting.
Import submodules instead of top-level imports, e.g. from spatialdata.transformations import Affine instead of import spatialdata. The major cause is that many libraries want to expose much of their API at a higher-level module (spatialdata.__init__), which is also recommended since it allows to make deeper module paths private and change them when needed. I believe that when doing a submodule import, Python does not skip the top-level module, so this may not work.
Use local imports in functions. I would avoid this where ever possible, because it makes code less organized and readable, and is a source of bugs (some refactoring tools don't catch all local imports and you would only notice at run time, if the bugged function is executed at all).

We can not do anything when:

When we implement a class that inherits from a spatialdata class, we must import the real parent class.
When we implement a Pydantic model with a field of type SpatialData (validation cannot work with just a forward reference)

Feb 13 '24 13:02 aeisenbarth

When you decide to tackle this, I suggest checking out how we did it for scanpy

https://github.com/scverse/scanpy/issues/756

There should be some useful snippets for you there. Scanpy also has some tests to make sure we don't reintroduce the causes of slow imports

Feb 13 '24 16:02 ivirshup