typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

What to do when the runtime lies?

Open tungol opened this issue 1 year ago • 7 comments
trafficstars

I've been cleaning up inheritance mismatches. I've found that several modules with a C implementation fudge the module name of their classes. For example:

ReferenceType is a class defined in the _weakref C implementation, and then imported from there into the weakref module. Because it's a C implementation, it's free to declare it's own qualified name, and calls itself weakref.ReferenceType: https://github.com/python/cpython/blob/main/Objects/weakrefobject.c#L368C1-L368C1

The intention, I believe, being that this class canonically lives in the weakref module, and the fact that it's being defined in _weakref is an implementation detail which is hidden at runtime. Typeshed declares the class in _weakref.pyi, which means it gets a qualified name of _weakref.ReferenceType

I can see two possible arguments:

  • Typeshed should continue to follow the layout of implementation over the declared implementation. e.g. ReferenceType is defined in _weakref originally, so that's the correct place for it.

OR

  • Typeshed should follow the declared implementation over the layout. e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

This pattern pops up in a lot of places: in _ast, _decimal, and _sqlite3 are some of the others I've noticed. I suspect the current state of typeshed is inconsistent on this issue, since not all modules that have a C implementation currently have a _foo.pyi stub file.

Is there a consensus among typeshed's maintainers here? I'm a little biased towards option 2 at the moment, since I'm trying to make as many inheritances match as I can, but I think there's a good argument for either side, and this is, in the end, a very minor issue with not much practical effect.

tungol avatar Dec 10 '23 21:12 tungol

Generally we try to follow the actual implementation, until there is a good reason not to. Typing private modules is "better", but it was often neglected, because practically speaking there isn't too much difference, and is slightly more work.

srittau avatar Dec 11 '23 13:12 srittau

I think what I mean is that it's not obvious what "follow the implementation" means here, because different parts of the implementation are conflicting with each other.

If we have:

_weakref.pyi

class ReferenceType: ...

weakref.pyi

from _weakref import ReferenceType as ReferenceType

Then we have followed the import structure accurately, but ReferenceType has a different qualname in typeshed and at runtime.

If we have instead:

_weakref.pyi

from weakref import ReferenceType as ReferenceType

weakref.pyi

class ReferenceType: ...

Now ReferenceType has a matching qualname in typeshed and at runtime, but we've had to reverse the import structure in order to do that. I'd make the case that the second version follows the implementation a little better, since the qualname is visible to mypy/stubcheck, while (I don't think) the import structure really is. I might be wrong about that?

At any rate, I wasn't suggesting that we shouldn't make these available in the private module namespace.

tungol avatar Dec 11 '23 18:12 tungol

This is a great question. I don't have a strong opinion, but I lean towards:

Typeshed should follow the declared implementation over the layout. e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

I have this preference for two reasons:

  • It results in better user facing diagnostics
  • It better reflects the truth in alternative Python implementations or cases where C accelerator modules are missing

This is also why in the "various inheritance" patch from #3968 I have a few cases of special casing underscore prefixed modules and classes: https://github.com/python/mypy/compare/master...hauntsaninja:mypy:stubtestbaseclass#diff-55639a1f38efeacbb10988649399037c13ad11e9c82890b453d98ea2bed9caf8R267

hauntsaninja avatar Dec 13 '23 00:12 hauntsaninja

I made several MRs along these lines, but then I discovered that the _pydecimal module changes its __name__ attribute, which I hadn't though of as an option: https://github.com/python/cpython/blob/d91e43ed7839b601cbeadd137d89e69e2cc3efda/Lib/_pydecimal.py#L151

Which got me thinking: what if mypy could detect cases where a module level __name__ = "foo" is set in the stubs, and set the name appropriately? Then we could follow both parts of the implementation, layout and naming.

I'm willing to take a run at that MR, if people think it's a reasonable idea.

tungol avatar Dec 17 '23 00:12 tungol

As it turned out, classes which set their own __module__ attribute were a lot easier to handle than modules which set their own __name__ attribute. I have a draft implementation for that here: https://github.com/python/mypy/pull/16698

With that, typeshed could set __module__ on classes which claim to be from a different module, and stubcheck could perform more accurate comparisons with runtime.

tungol avatar Dec 22 '23 00:12 tungol

There are a few pull requests that add missing _foo modules (#11187 #11155 #11159 #11174). What problem does that solve? In other words, when does a typeshed user want stubs for a _weakref or _io or _ssl module?

If I understood correctly, we decided to do this (which IMO is a good idea for several reasons):

e.g. ReferenceType considers itself to live in the weakref module, that's where the majority of users will get it from, so that's the correct place for it.

And if you have everything you'll ever need in weakref, then why do you need _weakref?

Note that I'm talking about adding _foo modules for completeness. I'm not saying that we should delete existing _foo stubs. For example, _tkinter stubs contain various things that aren't exposed in tkinter, but can be useful when working with tkinter.

Akuli avatar Jan 03 '24 09:01 Akuli

I created those while working on naming and inheritance - those all contain something which is a parent class of something else, so adding them brings the inheritance structure in typeshed closer to what it looks like at runtime. In other cases, it's a naming issue: the mod.Foo type calls itself _mod.Foo at runtime, and adding the _mod module lets typeshed align names that way.

It's a fair question whether that's justified in all cases or not. But that's not actually the subject of this ticket. Note that typeshed already has _weakref, and aligning the naming there would mean moving weakref.ReferenceType out of _weakref (where it's currently defined by both typeshed and runtime) and into weakref (where it will match the name given by runtime).

For another example and one more on subject: I've been investigating the issue of decoupling collections.abc from it's typing aliases (https://github.com/python/typeshed/issues/6257). Runtime defines these classes in _collections_abc.py rather than collections/abc.py so that they can be imported without incurring the performance hit of also importing collections/__init__.py, but then sets __name__ = "collections.abc" in that file so that the various clases call themselves collections.abc.Foo instead of _collections_abc.Foo.

Typeshed has the same performance concerns; If we define them in collections.abc, that's an increase in the number of files involved in the circular dependency cycle that contains typing and builtins. I haven't tested with collections.abc, but making sure that that mypy can resolve that dependency cycle fast enough was already something I had to pay attention to while doing some tests that move these classes from typing.pyi into _collections_abc.pyi. So I think defining them in _collections_abc.pyi but adjusting their naming with either the module-level __name__ = directive or a class-level __module__ = directive is the best option.

tungol avatar Jan 03 '24 18:01 tungol