importlib_resources icon indicating copy to clipboard operation
importlib_resources copied to clipboard

Improve import time

Open Sachaa-Thanasius opened this issue 8 months ago • 3 comments

importlib_resources currently takes a while to import. From rough local testing with whatever Python versions I have installed, the figures are something like this:

OS Python Implementation Python Version(s) Import Time (ms)
Windows 10 CPython 3.9-3.13 ~27-37
Windows 10 PyPy 3.10 ~89
WSL 2 CPython 3.9-3.13 ~8.5-12.5

(The numbers were obtained by cloning the main branch and running <python> -m timeit -n 1 -r 1 -- "import importlib_resources" a bunch without a venv, just raw import without site-packages baggage. If need be, I can get more precise results.)

For context on my perspective, most of the standard library modules that I've interacted with take a fraction of that time.

To improve this, I've been messing about and managed to get the figures down by at least 7x:

OS Python Implementation Python Version(s) Import Time (ms)
Windows 10 CPython 3.9-3.13 ~3.5-6.5
Windows 10 PyPy 3.10 ~5.5
WSL 2 CPython 3.9-3.13 ~0.6-1.5

This improvement mostly comes from the following:

  1. Replacing the inspect usage with a cheaper pattern that other stdlib modules use.
  2. Isolating typing-related symbols and annotations-related symbols so that they are only imported when deferred annotations using them are evaluated (with e.g. inspect.get_annotations or typing.get_type_hints).
  3. Isolating less-used but very expensive modules/submodules such that they are only imported on demand, i.e. only when deferred annotations containing them are evaluated or functions using them are called.
  4. Consolidating small submodules to avoid reading and compiling several extra files (also, it makes the structure more like importlib_metadata's, imo).

That's all without a few tricks employed by other standard library modules like importlib._bootstrap and importlib.util, e.g. avoiding importing functools and contextlib in exchange for slightly more verbose code. Doing those would make the improvement at least 10x.

Considering this is a "foundational" library (meant to partially replace the widely used pkg_resources) and is also in the standard library, and the potential for improvement is so large with a pretty small amount of effort, I was wondering if there would be interest in improving this, and if so, whether it would be acceptable for me to attempt that by upstreaming some of my work?

P.S. The branch I've been experimenting on is here if you're curious, though it does too many things (e.g. fixes some bugs, completes the typing, adds unnecessary personalization) for me to consider making a PR with it directly. Makes more sense to do things piecemeal.

Sachaa-Thanasius avatar Mar 30 '25 02:03 Sachaa-Thanasius

Just following up to gauge interest in making this happen.

cc @jaraco

Sachaa-Thanasius avatar Apr 06 '25 04:04 Sachaa-Thanasius

Thanks for the proposal! I'm open to these improvements, especially where the value exceeds the drawbacks. Here's what I recommend:

  • Perform independent changes in separate commits. This way, we can capture the specific performance benefits of each change and make choices about the tradeoffs of each commit separately.
  • Consider deferring the more invasive changes for later commits, as they're more likely to be rewritten or declined.
  • Where the changes aren't protected by regression tests, I'll often suggest to include comments to protect the change (so it isn't "optimized" away by a subsequent change).
  • For very independent changes, consider contributing them as separate PRs.

importlib_resources currently takes a while to import.

It's all relative. This library dramatically improved import time over pkg_resources.

If need be, I can get more precise results.

No need, but let's definitely collect per-commit differences in some environment (CPython, any OS).

4. Consolidating small submodules to avoid reading and compiling several extra files (also, it makes the structure more like importlib_metadata's, imo).

I'll be resistant to this change, especially if it degrades the cognitive benefits of separation of concerns, but feel free to propose it.

I was wondering if there would be interest in improving this, and if so, whether it would be acceptable for me to attempt that by upstreaming some of my work?

Definitely! And once contributed here, it will get rolled into CPython as well (as importlib.resources). Thanks for contributing here as it provides a richer way to contribute faster.

Feel free also to contribute any bug fixes as well (in separate PRs).

jaraco avatar Apr 06 '25 06:04 jaraco

I'll get on it, then. We'll see how this goes.

Sachaa-Thanasius avatar Apr 06 '25 22:04 Sachaa-Thanasius