mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

Add type hints

Open cbrnr opened this issue 1 year ago • 17 comments

What do people think about adding type hints to MNE? I know that this can be a controversial topic, and I myself have not been known to be a fan of type hints. However, if done right, I think they can be extremely valuable.

Since the codebase of MNE is large and does not have any type hints, it is not feasible to even try to type everything. This is only an option for new or smaller projects. However, we could start by adding type hints where they actually matter for users, and return values immediately come to mind.

For example, take the following code snippet:

import mne

raw = mne.io.read_raw("some_file.edf")

Currently, static type analyzers like Pylance in VS Code know nothing about the type of raw, so when users want to call a method (or see a list of available methods), VS Code does not provide any completions whatsoever.

raw.  # no completions at all

By adding -> mne.io.Raw to the definition of mne.io.read_raw(), we suddenly get the full list of methods:

Screenshot 2023-11-28 at 07 35 02

Therefore, the reader functions would be a good place to start adding return types.

cbrnr avatar Nov 28 '23 06:11 cbrnr

I think this example is very convincing and seems to be a clear user improvement.

I would just be cautious to avoid adding types everywhere. Typing that is not tested can become very quickly wrong and detrimental and using mypy to test our doc is huge effort in development, maintenance and CI time.

agramfort avatar Nov 28 '23 07:11 agramfort

💯

Type hints for return values at least for the most commonly used objects would improve the user experience tremendously. Let's do it.

There will certainly be cases where the return type depends on input parameters, so we'd need to add overloads. But I think that'll be manageable as well.

None of these changes has runtime implications, so it's okay to make mistakes and fix them later.

hoechenberger avatar Nov 28 '23 07:11 hoechenberger

Typing that is not tested can become very quickly wrong and detrimental and using mypy to test our doc is huge effort in development, maintenance and CI time.

Agreed, to me the blocker (and what I don't know how to do) is to have some test/CI that checks that any added type hints are actually correct

larsoner avatar Nov 28 '23 14:11 larsoner

I only know how to type check entire projects, and I don't know if checking is even possible for gradually typed projects (e.g. where we only added return types). I don't think the type checker can check for correctness in this case, because it doesn't know what the correct type should be if nothing else is typed.

cbrnr avatar Nov 28 '23 14:11 cbrnr

My take is: We don't need to add any checks.

We simply add type hints manually and every time we suspect that our editors are suggesting something that might be incorrect, we update the hint. Let's keep things simple!

hoechenberger avatar Nov 28 '23 14:11 hoechenberger

We don't need to add any checks.

We simply add type hints manually and every time we suspect that our editors are suggesting something that might be incorrect, we update the hint. Let's keep things simple!

For what it's worth, historically any time I can think of we've had this sort of policy of in the past and then managed to actually add a check, we found all sorts of errors (just to name a few off the top of my head: public function docstring existence, docstring formatting, docstring/option completion and consistency, sphinx warnings-as-errors, flake checks very early on). So I'd caution against thinking about keeping things simple here (via manual checking and intervention) as a good thing. But for now to move forward I agree we have to live with reviewers and users knowing and carefully checking any added type hints :sweat:

Maybe for now the rule should be only to add type hints to returns (not parameters) to our most-used functions? Maybe even just start with a first PR to add hints to reading functions like mne.io.read_raw_* and mne.read_epochs / mne.read_evokeds or so and we can see how it works.

For adding to params maybe we'd do it someday but I think we'd want https://github.com/numpy/numpydoc/issues/196 to be resolved to avoid duplication in docstrings and type hints

larsoner avatar Nov 28 '23 14:11 larsoner

we have to live with reviewers and users knowing and carefully checking any added type hints 😓

I really wouldn't worry much about this. When a return type hint is wrong, my editor will propose nonsensical tab completions or put red squiggly lines in places where they don't make sense. And if I blindly follow the incorrect suggestions, I will get a runtime error. Devs will immediately realize what is wrong, and users will report bugs at least in the latter case. So ... yeah, I'm not too worried, even though I do see your general concerns.

Maybe for now the rule should be only to add type hints to returns (not parameters) to our most-used functions?

Yes.

Return types, however, sometimes depend upon input parameters. In this case, you'd normally need to add overloads and also type at least the respective input parameters. However, what we could do instead for the beginning is simply annotate the return type as the union of all possible return types. Doesn't provide the greatest UX but is already a step in the right direction. (Example: read_evokeds may return a list of evokeds or a single evoked)

hoechenberger avatar Nov 28 '23 17:11 hoechenberger

For what it's worth, historically any time I can think of we've had this sort of policy of in the past and then managed to actually add a check, we found all sorts of errors (just to name a few off the top of my head: public function docstring existence, docstring formatting, docstring/option completion and consistency, sphinx warnings-as-errors, flake checks very early on). So I'd caution against thinking about keeping things simple here (via manual checking and intervention) as a good thing. But for now to move forward I agree we have to live with reviewers and users knowing and carefully checking any added type hints 😓

Definitely, but try running Mypy on our current code base – you will get tons of errors (some of them probably legit, but the majority because nothing is typed). So it's basically impossible to add checks if we're planning to gradually add type hints where they make sense. And return types for the most important functions make the most sense to me.

For adding to params maybe we'd do it someday but I think we'd want https://github.com/numpy/numpydoc/issues/196 to be resolved to avoid duplication in docstrings and type hints

I've been waiting for this for a long time, and it's probably never going to happen. I've switched to MkDoc (and MkDocstrings) for this reason (not just for this reason, but this one was a major one).

cbrnr avatar Nov 28 '23 17:11 cbrnr

For adding to params maybe we'd do it someday but I think we'd want numpy/numpydoc#196 to be resolved to avoid duplication in docstrings and type hints I've been waiting for this for a long time, and it's probably never going to happen

The solution to that problem is really simple: drop the types from the docstrings once they're in the signature. PyCharm, for example, already does that automatically even with un-typed projects that use Numpydoc-formatted docstrings: when you view the documentation of a function, the types show up in the documentation's function signature, while in fact they were extracted from the docstring. And in the parameter list, you only see the names and descriptions of the parameters, not their types. Very simple and effective.

hoechenberger avatar Nov 28 '23 22:11 hoechenberger

The solution to that problem is really simple: drop the types from the docstrings once they're in the signature.

That will make our rendered docs no longer give the types (with links etc.)

larsoner avatar Nov 29 '23 00:11 larsoner

... at least I think that's the case based on the NumpyDoc issue

larsoner avatar Nov 29 '23 00:11 larsoner

Yes, hence my switch to MkDocs, which does support pulling type hints into the rendered docstrings.

cbrnr avatar Nov 29 '23 05:11 cbrnr

I only know how to type check entire projects, and I don't know if checking is even possible for gradually typed projects (e.g. where we only added return types). I don't think the type checker can check for correctness in this case, because it doesn't know what the correct type should be if nothing else is typed.

Forgive the naive question, but I don't see anyone suggesting that we follow the advice from the mypy docs, namely: start small (by initially ignoring most sub-modules so that imports aren't followed), get mypy to run successfully before adding any annotations, and then adding it to our CIs (to prevent regressions) while we work on adding type hints. What is wrong with that approach? I get that it won't solve the vscode vs pycharm dilemma, but it is surely at least a step in the right direction that won't rule out any possible solutions to that dilemma later on?

drammock avatar Nov 29 '23 18:11 drammock

Sure, this will be the approach we will have to take. I was referring to the usefulness of type checking, quoting from the Mypy docs:

The more you annotate, the more useful mypy will be, but even a little annotation coverage is useful.

But we gotta start small, sure.

cbrnr avatar Nov 29 '23 18:11 cbrnr

I just wanted to start working on this, adding return type hints to our mne.io.read_raw_* functions, but I discovered that most (all?) readers only return a BaseRaw child, except for the FIFF reader, which returns Raw; however, our docstrings claim that all readers return Raw. What gives? Should we annotate as the specific BaseRaw child each reader implements (e.g., RawArtemis123)? Or as BaseRaw? Or (incorrectly) as Raw?

This also makes annotating read_raw() more difficult. Best we can do there is probably annotate as BaseRaw.

Edit: Please note I edited my comment, I had a typo in there where I wrote Raw in a place where it should have been BaseRaw.

hoechenberger avatar Nov 29 '23 18:11 hoechenberger

Raw is the publicly documented class. In theory we could document BaseRaw instead and link there. Or we could document all raw subclasses and (properly) type annotate that we use them. But this might confuse people more than help since really we want them to know that what they get back has all the BaseRaw methods and attributes.

larsoner avatar Nov 29 '23 19:11 larsoner

Raw is the publicly documented class. In theory we could document BaseRaw instead and link there. Or we could document all raw subclasses and (properly) type annotate that we use them. But this might confuse people more than help since really we want them to know that what they get back has all the BaseRaw methods and attributes.

Maybe we should simply rename Raw to RawFIFF and rename BaseRaw to Raw?

hoechenberger avatar Nov 29 '23 19:11 hoechenberger