typing icon indicating copy to clipboard operation
typing copied to clipboard

Type stubs for single-file top-level modules

Open not-my-profile opened this issue 2 years ago • 30 comments

PEP 561 currently states the following:

Package maintainers who wish to support type checking of their code MUST add a marker file named py.typed to their package supporting typing.

I consider this requirement to be problematic for Python libraries that are written in another programing language and distributed as compiled .so files. PEP 561 currently does not provide a way to mark .so files residing directly in site-packages/ to be typed, resulting in typed shared libraries needing to introduce an intermediary __init__.py file such as the following:

from ._native import *

__doc__ = _native.__doc__
if hasattr(_native, "__all__"):
    __all__ = _native.__all__

While this works for static type checkers I think this is obviously suboptimal because it has several undesired side-effects. Let's take the following as an example:

site-packages
├── my_project
│   ├── __init__.py
│   ├── _native.cpython-36m-x86_64-linux-gnu.so
│   └── py.typed

The unintended side-effects are:

  1. You can import my_project._native.
  2. _native shows up in the documentation generated by pydoc. E.g. under PACKAGE CONTENTS for the documentation of my_project and invoking e.g. help(my_project.foobar) will tell you that foobar resides in the module my_project._native.
  3. my_project.__file__ now is the __init__.py file instead of the .so file, potentially misleading developers into thinking the package is implemented in Python

So I really think PEP 561 should be amended to provide some way of marking single-file packages as "typed" without having to resort to hacks such as defining an intermediary __init__.py since that introduces a bunch of undesired side-effects that have the potential to confuse API users.

What do you think about this?

not-my-profile avatar Jan 09 '23 04:01 not-my-profile

I agree this is suboptimal, and I'd support lifting the restriction if we can come up with a good way to do it. Maybe @ethanhs has some insights into why we didn't provide a way to type single-module packages at the time.

The first obvious solution perhaps would be to put something in the package's dist-info directory, e.g. a new key in the METADATA file. But the problem with that would be that type checkers can't reliably go from the name of the installed module to the dist-info directory, because the names may not match.

JelleZijlstra avatar Jan 09 '23 05:01 JelleZijlstra

A very simple solution could be to create a .typed marker in the same directory by appending .typed to the filename of the shared library so for example site-packages/myproject.cpython-36m-x86_64-linux-gnu.so could be marked as typed by creating the file site-packages/myproject.cpython-36m-x86_64-linux-gnu.so.typed.

I am unfamiliar with the process of amending a PEP, are there other places where I should announce this discussion? PEP 561 has been marked as "Final", does this mean that introducing such an update would require a new PEP?

not-my-profile avatar Jan 09 '23 05:01 not-my-profile

This would technically require a new PEP, yes. However, it can be short.

JelleZijlstra avatar Jan 09 '23 05:01 JelleZijlstra

I think just {module_name}.typed would work and be easier for type checkers to search for

hauntsaninja avatar Jan 09 '23 05:01 hauntsaninja

Good point, Shantanu! Oh ... I just realized something ... static type checkers could just as well check for {module_name}.pyi within site-packages/ ... no need for a separate marker file at all.

Ok thanks Jelle, I'm working on a PEP draft for this right now.

not-my-profile avatar Jan 09 '23 06:01 not-my-profile

I think .pyi as a marker file would only work for single module extension modules, not for single module pure Python files with inline types.

hauntsaninja avatar Jan 09 '23 06:01 hauntsaninja

Right I don't really consider having to change foo.py to foo/__init__.py to mark it as typed with inline types to be problematic because:

  • documentation generators know how to deal with __init__.py files since they are so prevalent (e.g. pydoc doesn't list __init__ under PACKAGE CONTENTS and help displays in module foo instead of in module foo.__init__).
  • foo.__file__ directly points to the source code

So the drawbacks I described that exist for shared libraries pretty much don't apply to pure Python modules.

I don't think we should introduce yet another type of .typed marker file since that is bound to result in confusion. What are the differences between foo.typed and py.typed? You can put partial\n in py.typed but not in foo.typed (since the stubs of single-file modules cannot be partial) so we would have two different types of marker files with the same extension, which is just confusing. Even worse there could be a package named py (in fact there is one on PyPI), so site-packages/py.typed would have different semantics than site-packages/foo/py.typed ... which again is confusing. I guess we could deal with that by introducing a .typed-module extension but this again isn't as clean since py.typed already exists and is named py.typed instead of py.typed-package.

I think we should rather go with the intuitive solution of putting a .pyi file in site-packages.

not-my-profile avatar Jan 09 '23 07:01 not-my-profile

I think we should rather go with the intuitive solution of putting a .pyi file in site-packages.

I agree this is the most straightforward approach for single-file binary (compiled) packages. Pyright already supports a ".pyi" file in site-packages if it's present. It looks like mypy doesn't handle this currently, but I'm guessing it would be a simple change.

erictraut avatar Jan 09 '23 07:01 erictraut

I wrote a draft PEP "Type stubs for single-file top-level modules". Feedback is very much welcome :)

(This is my first attempt at writing a PEP.)

not-my-profile avatar Jan 09 '23 08:01 not-my-profile

  • I remember seeing some grumbling about needing py.typed even for single packages, but can't find it now. Maybe it was a package that wanted to remain distributed as a single .py file and therefore wanted typeshed stubs?
  • Relevant related issue: #1061
  • The PEP should clarify where in the resolution order defined in PEP 561 (https://peps.python.org/pep-0561/#type-checker-module-resolution-order) the stub should go. (Presumably under # 4.) What happens if there is both a foo.pyi and a foo/ directory with a py.typed in it?
  • You'll need a PEP sponsor; I'm happy to do it, but @hauntsaninja or @AlexWaygood are also eligible and I have several PEPs in flight already.

JelleZijlstra avatar Jan 09 '23 13:01 JelleZijlstra

I'm supportive of the idea and I'm happy to co-sponsor :)

However, I wouldn't want to be the sole sponsor — I haven't written or sponsored a PEP before, so I'm not 100% sure of the exact process. I also feel like I'm pretty close to full capacity on my open-source commitments at the moment.

AlexWaygood avatar Jan 09 '23 13:01 AlexWaygood

  • I remember seeing some grumbling about needing py.typed even for single packages, but can't find it now.

I've seen this in a few places; I'll also see if I can dig up some references.

AlexWaygood avatar Jan 09 '23 13:01 AlexWaygood

Thanks for putting the effort into writing this up (and starting this discussion in the first place)!

Given that PEP 561 95% solves this problem, I feel if we want to make changes to standards here, we shouldn't solve only half of the remaining 5% of the problem.

I'm not sympathetic to the claim that pure Python developers don't feel the drawbacks you mention, mainly because I think 2/3 of those drawbacks are very weak: "you can import my_project._native" (so what? consenting adults), "potentially misleading developers into thinking the package is implemented in Python" (such developers would also believe that numpy is in pure python)...

...I think the biggest reason to do this is just "it's annoying to complicate project layout because of a dumb shortcoming of type checkers" and "why have package when you can have module, simple is better than complex". This applies equally to single pure Python modules with inline types and single extension modules.

hauntsaninja avatar Jan 09 '23 20:01 hauntsaninja

^I agree with everything @hauntsaninja just said; I also think it would be a real shame to not find a way to solve this for pure-Python file packages

AlexWaygood avatar Jan 09 '23 21:01 AlexWaygood

The suggested approach of putting a .pyi file in site-packages next to the implementation file would also work for pure-Python packages, right?

JelleZijlstra avatar Jan 09 '23 21:01 JelleZijlstra

Yes, but not for inline types, which I strongly encourage as the most maintainable way to add types to pure Python

hauntsaninja avatar Jan 09 '23 21:01 hauntsaninja

Ah right, good point.

Possible hacky solution: Use a .pyi file, but put some special marker code in the .pyi file that indicates "look inline".

JelleZijlstra avatar Jan 09 '23 21:01 JelleZijlstra

Use a .pyi file, but put some special marker code in the .pyi file that indicates "look inline".

__py_typed__ = True?

So, if type checkers see a .pyi file with just that, and nothing more, they know to look in the equivalent .py file for inline types?

AlexWaygood avatar Jan 09 '23 22:01 AlexWaygood

Thanks everybody :)

I remember seeing some grumbling about needing py.typed even for single packages, but can't find it now.

There is #1297.

The PEP should clarify where in the resolution order defined in PEP 561 (https://peps.python.org/pep-0561/#type-checker-module-resolution-order) the stub should go. (Presumably under # 4.)

Yes I agree with that.

What happens if there is both a foo.pyi and a foo/ directory with a py.typed in it?

If both foo.pyi and foo/__init__.pyi + foo/py.typed exist I think type checkers should do the following:

  • use foo/__init__.pyi if foo/__init__.py exists (because Python's import statement also prefers packages over single-file modules if both exist in the same directory)
  • otherwise use foo.pyi

@hauntsaninja I agree with 2/3 drawbacks being weak. I think the main drawback is that tooling such as documentation generators generally don't have special support for recognizing such re-exporting __init__.py files.

I am alright with also solving the problem for pure-Python modules with inline types. My initial idea just now was to create a .pyi file next to the .py file as a symbolic link to the .py file, however seeing that symbolic links aren't supported in wheels that unfortunately does not appear to be an option.

I think I'd rather introduce a new file extension for new marker files (e.g. .typed-module) rather than overloading the meaning of existing file extensions such as .typed or .pyi. So an empty {module}.typed-module file would imply that {module}.pyi should be used if it exists and if it does not if the module is implemented as {module}.py then to look for inline types. This would allow for the following combinations:

  • {module}.*.so + {module}.pyi + {module}.typed-module (arguably the marker file is redundant in this case)
  • {module}.py + {module}.pyi + {module}.typed-module (arguably the marker file is redundant in this case)
  • {module}.py + {module}.typed-module (for inline types)

What do you think? Perhaps the .typed-module marker file should be optional if there exists a .pyi file? So you would actually only need to use it for inline types. And type checkers would only have to check for it if the .pyi doesn't exist.

not-my-profile avatar Jan 09 '23 22:01 not-my-profile

There is already a way to handle inlined types for a single-file module: convert it to a multi-file package and add a "py.typed" marker. This isn't too onerous. Let's not add hacky and inconsistent solutions (like symbolic links or files with specific name extensions). Resist the urge to overreact to one or two people grumbling about needing to do a few extra steps (one time) to make this work.

There is currently no way to package type information for a single-file compiled module. The side-by-side ".pyi" file is an elegant solution to this currently-unsolved problem. Let's focus on solving this problem, not creating more complexity and inconsistencies to solve a problem that already has a solution.

erictraut avatar Jan 10 '23 00:01 erictraut

I think a key difference between pure modules and extension modules that has not been mentioned here yet is that the __init__ module of a package always has to be a pure module. So while a pure top-level module can simply be moved to {module}/__init__.py the same does not work for extension modules (because CPython does not pick up {module}/__init__.cpython-36m-x86_64-linux-gnu.so). So the easy solution of simply moving {module}.py to {module}/__init__.py file and creating a marke file does not work for extension modules, requiring the creation of an unintuitive and disadvantageous intermediary/re-exporting __init__.py file.

I have strongly revised my PEP draft to better explain the reasoning (as well as answering the questions raised by @JelleZijlstra).

Sidenote: I have now also specified that these top-level .pyi files should be recognized in the 4th step of the module resolution order as per PEP 561. While looking at that order I think I have spotted an oversight in PEP 561, for which I have just opened #1334 in order to keep this discussion on topic.

@hauntsaninja There are three scenarios:

  1. a top-level extension module (which always needs a .pyi file)
  2. a top-level Pure module with a type stub file (which always needs a .pyi file)
  3. a top-level Pure module with inline types

Since the first two cases always need a .pyi file, I think just supporting recognizing a .pyi file in the same directory is very much what you would expect to work, so I think we really should support this, especially because the first case requires such an unintuitive/disadvantageous workaround in the form of a re-exporting __init__.py file.

Addressing the third case would require a solution that is not obvious (defining magic variables like __py_typed__ = True in .pyi files or introducing a brand-new file extension like .typed-module is all very much arbitrary), so I'd have to agree with @erictraut that the inconvenience caused by having to turn a .py module into a package is not great enough to warrant the introduction of such an arbitrary solution (and thus additional complexity). And by complexity I don't mean complexity in the implementation of type checkers (checking for a .typed-module file would be quite easy) but rather complexity in the conceptual model of "how to distribute types for Python packages" that Python developers have to deal with when distributing packages or debugging the type stub resolution order.

not-my-profile avatar Jan 10 '23 08:01 not-my-profile

There is already a way to handle inlined types for a single-file module: convert it to a multi-file package and add a "py.typed" marker. This isn't too onerous.

True, however I've seen project owners rejecting adding the type hint marker because it would have required them to move to a package structure. It would be a much easier sell, if there would be a solution which didn't require a structure change.

I do agree that adding a separate file with a new suffix would be too complicated. What about adding __py_typed__ = True to the python module itself? That would be simple enough and type checkers would only need to check for it for single file modules.

cdce8p avatar Jan 10 '23 17:01 cdce8p

I've seen project owners rejecting adding the type hint marker because it would have required them to move to a package structure

Really? That surprises me. Can you provide examples?

I guess I'm not very sympathetic to this argument. This is a really low bar. It involves a one-time change that requires just a few minutes of work. If a library maintainer is unwilling to do this, then they're just looking for excuses not to support typing. I don't think that inventing redundant mechanisms is the right solution to this problem.

erictraut avatar Jan 10 '23 17:01 erictraut

I've seen project owners rejecting adding the type hint marker because it would have required them to move to a package structure

Really? That surprises me. Can you provide examples?

Unfortunately not. It was some time ago and IIRC the owner wasn't too convinced about the usefulness of typing.

This is a really low bar. It involves a one-time change that requires just a few minutes of work. If a library maintainer is unwilling to do this, then they're just looking for excuses not to support typing.

The refactoring might be simple but packaging is another story. Sometimes it's easier to leave it alone if it works. Adding a simple line to the Python file wouldn't require any other changes.

-- Side note: For time to time I also come across projects which do have a py.typed file in their repo but don't include it with their sdist / wheel. Packaging is hard sometimes, especially if you don't do it frequently 🤷🏻‍♂️

cdce8p avatar Jan 10 '23 17:01 cdce8p

Hi! Sorry for not responding sooner, been a bit busy recently.

Let me start by giving context into why PEP 561 makes the tradeoffs it does, and what I was discussing with type checker authors at the time.

  1. I think the largest reason was at the time getting installed package metadata was still newly added in the standard library (importlib.metadata was only added in 3.8). We didn't really want to make people depend on third party packages to get the metadata about the typing status. I think there was also some concern that non-Python type checkers would have a harder time reading package metadata because they would have to build that infrastructure themselves.
  2. There was significant interest, from users and type checker authors at the time to have per-package (in the folder of Python code sense) metadata, and allowing py.typed to exist in that folder seemed the simplest way to accomplish that. This enables gradual adoption of typing.
  3. py.typed did not require integrations from all packaging tools. It was a significantly smaller amount of implementation work to reach people.

That being said, I do think PEP 561 is a bit lacking.

I've seen project owners rejecting adding the type hint marker because it would have required them to move to a package structure

I've actually also seen this, but I cannot recall where. I do still think the bar is pretty low, but I also think the UX for marking a package as typed could be better.

One of the original alternate designs for PEP 561 was to include typing support status in the distribution metadata. This would most likely exist as a list of files in the distribution that support typing or something like that. This solution is particularly appealing now that 3.7 is almost end of life (June of this year), and so soon all versions of Python will support importlib.metadata. There are however downsides, such as third party type checkers that don't want to call out into Python needing to implement this logic themselves. In addition, it is rather orthogonal to py.typed, so I worry it could be confusing (maybe if the UI is just typed=True to the user though, that isn't as much of a concern).

I agree though that if we don't want to shift to something like the above metadata-based system, keeping the status quo and suggesting maintainers change the layout of the package is an acceptable solution.

emmatyping avatar Jan 12 '23 07:01 emmatyping

Possible hacky solution: Use a .pyi file, but put some special marker code in the .pyi file that indicates "look inline".

I'm very likely missing something well known here, but could someone tell me why a type checker needs a marker to consider looking inline rather than just looking inline unconditionally?

Kentzo avatar Jan 12 '23 08:01 Kentzo

@Kentzo it's a good question, particularly in 2023. The main reason is that if a package doesn't have types or only has partial types, it's useful to warn a user about that so they don't falsely think they have typing coverage. This also gives the user the opportunity to install a stubs package themselves / the type checker to easily detect this situation and hint to do so.

Historical reasons are that annotations weren't always reserved for typing use and type checkers sometimes struggled with unusual code.

If I were writing a type checker in 2023, I'd probably always try to analyse the code because more information is usually better and you can still type check the shape of things, but I'd use the absence of py.typed to surface an error to the user.

hauntsaninja avatar Jan 12 '23 08:01 hauntsaninja

FYI I ran into this recently due to the lack of support for top-level .pyi files. In my use case the project is a single .py file by design for ease of copying/integration with non-Python code and is purposefully kept small so it can be passed entirely as a string on the command-line (e.g., Rust code can embed the entire file as a string constant and execute a subprocess with the string constant passed in via argv). I would be happy to provide a .pyi file for the single function the code exposes, but that currently doesn't work with mypy due to the current standards.

brettcannon avatar Aug 08 '23 18:08 brettcannon