mne-python
mne-python copied to clipboard
Consider reverting lazy imports
I would like to discuss reverting the changes that introduced the lazy-loader package. I have been unhappy with this change basically almost from the start (although I made a positive comment in the initial discussion, but this was before I fully understood how lazy loading was implemented), and I think it is not good for the project's health in the long run for the following reasons:
- Lazy loading is essentially a sophisticated workaround rather than a supported Python feature. The relevant PEP 690 was rejected, and the related discussion suggests it's unlikely to be officially adopted in the future. As a result, the mechanism is not guaranteed to work in all cases and has already shown instances of breaking.
- The package relies on
.pyifiles for runtime behavior, which contradicts their intended purpose. Official references, such as PEP 561 and PEP 484, explicitly state that stub files are strictly meant for static type checking and not execution. - Lazy loading can be useful for CLI tools or packages with many submodules, where startup time is a major concern. However, I think the advantages for MNE-Python are questionable:
- The import time wasn't particularly long to begin with.
- While a 50% reduction in relative terms sounds significant, the absolute gain is just 175 milliseconds, which is negligible.
- We already used nested imports for optional dependencies, and nested imports can also be used to import packages on demand without the need for the non-standard
lazy-loadermechanism. - Delaying imports until first use might be more disruptive than a slightly longer initial import time.
- The
lazy-loaderpackage does not appear to be very actively maintained beyond some tooling updates, and there are several unresolved issues such as all packages being loaded at once (issue #131) and eager imports not working as expected (issue #128).
Given these concerns, I strongly suggest that we at least reconsider our decision to adopt the lazy-loader package. Since this will be a rather large change involving many files and probably a lot of manual work, I'm happy to submit a PR if there is general agreement that this is the right direction. It is also not super urgent, but I think the sooner we address this, the easier it will be to revert the changes.
I lean on the same side here...
Message ID: @.***>
This comment has the relevant backstory links that @cbrnr forgot to include:
https://github.com/mne-tools/mne-python/issues/12388#issuecomment-1910876635
we started with https://github.com/mne-tools/mne-python/pull/11838 (but did it a dumb way, without .pyi files). This caused problems with IDE completion hints etc, as discussed in https://github.com/mne-tools/mne-python/pull/12059 (that's where all the opinions are aired). The IDE problem was fixed by https://github.com/mne-tools/mne-python/pull/12072.
It's not the entire backstory, just some aspect. I think I summarized all important points in my initial post.
It's not the entire backstory, just some aspect.
Sure. I wasn't trying to imply that your description wasn't relevant. Just that there are some other things that are also relevant that you didn't include.
I think I summarized all important points in my initial post.
I disagree. Past discussions / decisions are relevant and important context. I'm surprised that you seem to be saying that you omitted them intentionally.
Of course I'm not saying I omitted something intentionally, come on!
Of course I'm not saying I omitted something intentionally, come on!
I guess I misunderstood. Saying "I think I summarized all important points in my initial post" implied that you didn't think linking to past discussions was important, so you chose not to do it. Which, again, was surprising to me, especially since it's a frequent request when old decisions get revisited --- I assumed you'd have thought of doing so.
I agree with @cbrnr I don't see much value in lazy-loading and I'm also concerned about the package maintenance and adoption. +1 to revert it.
4. The
lazy-loaderpackage does not appear to be very actively maintained beyond some tooling updates, and there are several unresolved issues such as all packages being loaded at once (issue #131) and eager imports not working as expected (issue #128).
I find this concerning and I never liked the fact that we now critically depend upon a hack that didn't pass standardization (via a PEP) – for IMHO very good reasons. If forced to vote, I'd decide against the current lazy loader implementation.
That said, so far I'm not running into problems (anymore) so I actually don't mind (for now).
FWIW I just briefly checked some of the projects that are listed as having "endorsed" SPEC-0001: NumPy, SciPy, xarray.
In none of the above could I find traces of the lazy-loader dependency!? Either I'm missing something or the way the SPEC website presents itself is kind of misleading, suggesting adoption where there is none. Happy to be corrected, of course, but right now I'm kind of … confused.
I still like the lazy loading stuff but can live with the majority decision here. I guess we'll see what problems/harder other decisions come out in #13124 -- one we've already hit that people should hopefully discuss is how to handle the sklearn dependency in mne.decoding: https://github.com/mne-tools/mne-python/pull/13124#issuecomment-2678596013
In none of the above could I find traces of the lazy-loader dependency!? Either I'm missing something or the way the SPEC website presents itself is kind of misleading, suggesting adoption where there is none. Happy to be corrected, of course, but right now I'm kind of … confused.
FWIW just looking into one, SciPy uses a lazy loading approach even if it's not via lazy_loader, see https://github.com/scipy/scipy/pull/15230 / https://github.com/scipy/scipy/blob/b9b8b8171fd1453b42fc4492a279c71b54141c51/scipy/init.py#L129 . I think for DRY/community adoption it would be better to use lazy_loader but I haven't looked into why they use their own code. I'm assuming the same would probably hold for NumPy, xarray, etc.
Maybe they have similar issues with the package like the ones I raised here?
Could be, or could be that nobody has taken the time to switch over. Someone would have to look into it and investigate further.
Lazy loading can be useful for CLI tools or packages with many submodules, ...
One point missed here I think is that anything we change has consequences for any dowstream package that does import mne. So if our import time goes up, all of theirs do, too. So we either force them to live with the import time bump, or ask them to nest their imports or do some lazy importing of their own.
The import time wasn't particularly long to begin with.
One more point of data here... the original PR that added it had timings that went from ~500 ms to ~160 ms (https://github.com/mne-tools/mne-python/pull/2001). Looking a bit now:
$ time python -c "import mne"
real 0m0.315s
$ time python -c "import mne; import scipy.linalg"
real 0m0.433s
$ time python -c "import mne; import scipy.linalg; from sklearn.base import BaseEstimator"
real 0m0.943s
As someone who uses MNE in other packages that I launch from the CLI a lot (i.e., psychophysics experiments), having it creep up to half a second (or a second!) to import mne is pretty painful. So if we do revert lazy loading I think we need to re-nest a lot of imports (probably the same list plus test from before we added lazy_loader in the first place). And adding sklearn as a hard dependency / invoked when doing import mne seems like a bad idea.
No objection against nesting the slow imports as we previously did (including sklearn).
sklearn isn't as simple, copying from https://github.com/mne-tools/mne-python/pull/13124#issuecomment-2678596013 so we can discuss in one place :
For sklearn we have a harder decision. Without lazy loading mne.decoding we basically have two choices:
- Require people to import
mne.decodingbefore accessing anything in that namespace. So you could no longer do:
You would need to add aimport mne mne.decidng.CSP(...)import mne.decodinginstead (orfrom mne.decoding import CSPetc.). A lot of other packages are this way, such assklearn(and SciPy was at some point, might still be). - Make
sklearna hard requirement of MNE.
I guess I'd vote for (1). It's a bit weird if we do it just for one submodule, but I guess it's justified because it's the only one that requires an optional dependency for its class hierarchy.
Beyond that I'm not sure how we'd get classes in mne.decoding to properly subclass sklearn classes (e.g., TransformerMixin, MetaEstimatorMixin, etc.). Before lazy loading we had our own duplicated, half-baked, partially incorrect variants of these subclasses, which we definitely should not go back to. It's a big maintenance burden and constantly breaking in addition to being a bad violation of DRY, not working with sklearn class validation tools, etc.
Before lazy loading we had our own duplicated, half-baked, partially incorrect variants of these subclasses, which we definitely should not go back to.
+100
Well, the ship may have already sailed, but I'd like to take the time to enumerate some arguments on both sides of this issue.
in favor of keeping lazy loader
import mneis faster. Even if it's less than 200 ms, abandoning it now will be a performance regression. In a single interactive session you might not notice, but over dozens of CI runs per day (many of which we pay for) those seconds will add up. MNE import time also impacts import time of all downstream projects that themselves import MNE.- we get the benefit of nesting expensive dependency imports without actually having to nest them. Previously there were many PRs where a contributor failed to nest a SciPy import, which caused a test failure, which often they didn't know how to interpret (or even access) meaning maintainer time was spent explaining, coaching, and/or fixing the problem. With lazy_loader, there's no longer a need to nest external dependencies, which aligns with contributors' default expectations.
- It's not causing us any problems. MNE-Python isn't experiencing any of the problems cited by @cbrnr:
- The first one (https://github.com/scientific-python/lazy-loader/issues/131) is an edge case that doesn't affect us: it behaves suboptimally when dependency packages aren't found (AKA, when your environment has problems, it doesn't work perfectly).
- the other one (https://github.com/scientific-python/lazy-loader/issues/128) looks quite similar to something we hit, and fixed. It was a genuine problem with MNE that was revealed by lazy_loader adoption, not a lazy_loader bug. Since #12072, things have been working smoothly for us AFAICT.
- Less churn for contributors. As mentioned in (2) above, our import preference / policy now aligns with what people want/expect to do anyway (put external imports at the top of the file). It's a simpler system than what we had before (only some external imports got nested), and for folks who have been contributing in the last few years, I worry that we'll look a bit scattered (and contributors will be confused) going from nested to non-nested and back to nested in the span of less than 2 years.
against keeping lazy loader
(I've kept these in the same order as @cbrnr's original post
- It's not a built-in feature of Python (and likely never will be).
- It uses
.pyifiles for purposes they weren't intended for. - The speedup doesn't matter. "Delaying imports until first use might be more disruptive than a slightly longer initial import time."
- lazy_loader is not actively maintained.
response to arguments against keeping it
It's not a built-in feature of Python.
I can't fathom why this should hold any weight. Lots of things will never make it into the standard library.
It uses
.pyifiles for purposes they weren't intended for
This is the only argument I find somewhat convincing. It seems unlikely, but it is possible that Python will someday set stricter rules about .pyi files that would interfere with them being used in this way. More concerning to me, though, is what might happen when we decide to make MNE fully type-annotated, and decide that we want/need the .pyi files for, e.g., multiple dispatch definitions.
The speedup doesn't matter
See point (1) in favor of keeping lazy_loader.
lazy_loader is not actively maintained
Maybe. I'd say it's a bit too early to tell. In https://github.com/scientific-python/lazy-loader/issues/128 Stéfan said "ping me if you don't get it sorted out" and he was never pinged again, so I feel it's unfair to count that as evidence for lack of maintenance. There's also a bit of a built-in backstop, in that lazy_loader is housed within https://github.com/scientific-python which has a lot of very competent people who could step in / take over if Stéfan ever did go missing.
I think the ideal solution would be to not import everything into the global mne namespace, so having to import mne.decoding or even import mne.io would be perfectly fine for me. The package has grown a lot, and rarely anyone needs all functionality in their work. This would be a Pythonic solution in literally every way:
- It would make
import mneat least as fast as with lazy loader. - We don't need to nest any imports.
- It is the most expected way to do imports in Python, which is the least amount of churn for developers.
- It does not misuse
.pyistubs.
I don't want to reiterate my points, they haven't changed, but I do want a consensus solution and I am definitely not forcing anything onto the project. If anyone sees any other options, please let us know! Maybe there is a way to keep the current lazy loader, but get rid of the .pyi stubs (I would not prefer this, but it is better than the status quo)?
Just one quick response to the point on CI runs: if we switch from pip to uv pip (not even plain uv), we get the following install times for mne[dev] (with cached downloads of course):
pip install "mne[dev]": 22suv pip install "mne[dev]": 0.35s
So about 62x speedup, which I think makes arguing about 200ms longer import times a moot point.
Marginally (?) related but has anyone considered that the metrics we're using here are not really useful anyway? Just comparing import mne times is pointless; one needs to measure the cumulated import times until everything is loaded that I'll need to perform a certain action or analysis. Making everything import "lazily", i.e., deferring most imports to "later", and then just omitting them in the measurements is probably not a valid approach.
I think the ideal solution would be to not import everything into the global
mnenamespace, so having toimport mne.decodingor evenimport mne.iowould be perfectly fine for me.
To me this is a non-starter. Interactive API exploration is something I use a lot with both MNE (where sometimes I forget the exact name of a func, so I mne.viz.<TAB> to find it), or just to explore what an unfamiliar package offers.
the metrics we're using here are not really useful
It's true that the import "savings" doesn't just disappear; if you use mne.decoding then the import overhead of that submodule will occur when it is first used. But as a maintainer, most of my interactive MNE sessions are "quick reproducers" for debugging; I'd say more than 90% of my sessions never use mne.decoding (or beamformer, or inverse_sparse, or ...). To spell it out: each session suffers the import time of only the modules needed / used.
(side note: this will become a much bigger deal when #13059 lands, because hedtools has an import time between 1.5 and 2 seconds. Sure, it can be nested inside the HEDAnnotations class, but there are other reasons mentioned above why nesting imports is undesirable.)
TL;DR: comparing the import mne time is not a 100% fair comparison, but it's also not "pointless". It tells you the maximum time savings per import. I don't have data on which submodules are most commonly used in the interactive sessions of all of our users, but my personal experience suggests that most of the time most of the submodules aren't needed.
To me this is a non-starter. Interactive API exploration is something I use a lot with both MNE (where sometimes I forget the exact name of a func, so I mne.viz.<TAB> to find it), or just to explore what an unfamiliar package offers.
I don't use it at all, I guess we will not be able to come up with a statistic on how many users require that feature or not. In addition, you can still interactively explore if you first import mne.viz (to stick with your example, it would be sufficient to just not export the largest/slowest modules to the mne namespace).
you can still interactively explore if you first
import mne.viz
which is even slower than importing everything eagerly in the first place.
(to stick with your example, it would be sufficient to just not export the largest/slowest modules to the
mnenamespace).
which breaks interactive API exploration. How is that "sufficient" exactly?
Today the Steering Council discussed the question of reverting our adoption of the lazy_loader package. Here is our consensus:
- For end users, there are 3 major advantages of the status quo of lazy loader adoption.
- interactive API exploration using
TABworks the way users want/expect, for all submodules of MNE. - initial import of
mneis fast, import time of dependencies is incurred on a per-submodule basis, and (crucially) it is incurred only when those submodules are accessed. - scikit-learn is an optional dependency (as it was before
lazy_loaderadoption).
- interactive API exploration using
- For maintainers, there are 2 additional considerations:
- our prior practice of nesting certain SciPy submodules was burdensome for maintainers, as contributors often didn't know about the requirement, and often weren't savvy enough to notice the associated "import nesting" test failure, interpret its failure message, and fix the problem themselves.
- we may someday want/need to use the
.pyifiles for their intended purpose (e.g., defining type annotations for cases of multiple dispatch in our codebase). It is unknown whether the lazy-loader usage of.pyifiles will interfere with our ability to do this (though our hunch is that there probably won't be any conflict).
- making scikit-learn a required dependency is a non-starter, as it will add considerably to the import time. On my (fast desktop) system,
time python -c "import sklearn"takes 650-690 ms, which would be in addition to the current ~330 ms for MNE plus any additional slowdowns due to non-lazy import of scipy submodules or other dependencies (which have been estimated at around 200 ms, even with nesting of the more expensive SciPy submodules). - The relevance or strength of reasons for not adopting lazy_loader in the first place are in our view not necessarily equivalent to the relevance or strength of reasons for dropping lazy_loader after previous adoption. In our opinion, turning fully-functional interactive API exploration into a feature that doesn't work for all submodules is a performance regression for end-users, and regressions are a harder sell than simply maintaining an imperfect status quo. We think some users will notice such a regression, be confused/annoyed, and complain, which costs them time and costs us time.
- Many reasons presented in favor of reverting lazy_loader adoption are either hypothetical/speculative (lazy_loader might become unmaintained) or appeal to what is most "Pythonic". We fully acknowledge that
lazy_loaderuses.pyifiles for a purpose they were not intended for, and that our__init__.pyfiles now look different from most Python packages. However, from a practical perspective, since #12072 landed, we haven't seen any evidence thatlazy_loaderis actually causing problems with the functioning of our software, for users, for our maintainers, or for downstream maintainers. In particular:- this was a case of a user omitting a
pyinstallerflag - this issue in the lazy_loader repo doesn't affect MNE-Python
- this other issue in the lazy_loader repo also doesn't affect MNE-Python
- this was a case of a user omitting a
With all that said, the Steering Council would welcome a change that removes lazy_loader, as long as:
- scikit-learn is still an optional dependency
- interactive API exploration with
TABstill works for all submodules - initial
import mnetime is not substantially worse than it was prior to https://github.com/mne-tools/mne-python/pull/11838
Without having attempted it, we are cautiously optimistic that the approach taken by SciPy (which @larsoner linked to previously) might satisfy those conditions.
Regarding the other points made above:
- we are willing to return to a situation where expensive imports are nested, even if it brings back the maintenance burden of guiding contributors who are unaware of that requirement.
- we are willing to weaken/abandon our three criteria stated above, if it turns out that lazy_loader's use of
.pyifiles does indeed interfere with comprehensive type annotation and comprehensive type annotation is actually undertaken by one of our maintainers / contributors. (note that there are plans to do comprehensive type annotation that are pending funding approval, so it's not totally hypothetical).
I wanted to add to this discussion a note that there is a new PEP for adding explicit lazy loading to Python. It is currently under (very) active discussion here, but having followed that discussion fairly closely for the last week, my impression is that the community is mostly aligned in favor of the PEP and its approach. This makes me cautiously optimistic that it will be accepted, and this in turn motivates the following statement:
If anyone was planning to attempt replacing our use of lazy_loader with the approach SciPy uses (or some other approach we haven't though of yet), please consider holding off until the PEP is reviewed, because if it is accepted, we might as well transition straight to the standard library solution once it becomes available.
This is great news! I'm all for an official solution, and if this really lands I'm happy to work on a PR.
This proposal looks very nice indeed :)
update: PEP 810 was accepted on Monday.