Hi!

This issue is going to be a summary of my prototypes to generate Python Interface files ('stub files', '.pyi files') automatically. The prototypes are available as #2379 and #2447.

#2379 aimed to generate Python stub files entirely at compile-time. This is not possible because proc-macros run before type inference and trait analysis, so proc-macros cannot know if a type implements a trait or not.
#2447 aims to generate information at compile-time that represents the structure of #[pyclass] structs (which methods exist, what arguments do they accept) to be read at run-time by the stub generator.

I'm presenting the results here to get feedback on the current approach. I'm thinking of extracting parts of the prototypes as standalone features and PRs.

Progress

Accessing type information at runtime:

[x] #2490
[ ] Generate precise type information as part of #[pyclass]

Accessing structural information at runtime:

[ ] Declare the inspection API
[ ] Generate the inspection API structures as part of #[pyclass] and #[pymethods]
[ ] Collect the inspection data per module

Python interface generation:

[ ] Generate the Python interface files
[ ] Document how to generate PYI files in end user's projects

Summary

The final goal is to provide a way for developers who use PyO3 to automatically generate Python Interface files (.pyi) with type information and documentation, to enable Rust extensions to 'feel' like regular Python code for end users via proper integration in various tools (MyPy, IDEs, Python documentation generators).

I have identified the following steps to achieve this goal. Ideally, each step will become its own PR as a standalone feature.

provide a way to extract the full Python type information from any object passed to/retrieved from Python (e.g. List[Union[str]], not just PyList).
provide an API to describe Python objects at run-time (list of classes, list of methods for these classes, list of arguments of each method, etc).
improve the macros so they generate at compile-time the various inspection data structures (the API from 2.)
write a run-time pyi generator based on the inspection API

1 and 2 are independent, 3 and 4 are independent.

Full type information

The goal of this task is to provide a simple way to access the string representation of the Python type of any object exposed to Python. This string representation should follow the exact format of normal Python type hints.

First, a structure representing the various types is created (simplified version below, prototype here):

struct TypeInfo {
    Any,
    None,
    Optional(Box<TypeInfo>),
    ...
    Builtin(&str),
    Class {
        module: Option<&str>,
        name: &str,
    }
}

impl Display for TypeInfo {
    // Convert to a proper String
}

PyO3 already has traits that represent conversion to/from Python: IntoPy and FromPyObject. These traits can be enhanced to return the type information. The Python convention is that all untyped values should be considered as Any, so the methods can be added with Any as a default to avoid breaking changes (simplified version below, prototype here):

pub trait IntoPy<T> {
    // current API omitted

    fn type_output() -> TypeInfo {
        TypeInfo::Any
    }
}

pub trait FromPyObject {
    // current API omitted

    fn type_input() -> TypeInfo {
        TypeInfo::Any
    }
}

The rationale for adding two different methods is:

Some structs implement one trait but not the other (e.g. enums which use derive(FromPyObject)), so adding the method to only one of the trait would not work in all cases,
Creating a new trait with a single method would be inconvenient for PyO3 users in general, as it would mean implementing one more trait for each Python-exposed object
Both methods have a sensible default, and are both trivial to implement so I don't believe there are any downsides,
Some Python classes should have a different type when appearing as a function input and output, for example Mapping<K, V> as input and Dict<K, V> as output. Using two different methods supports this use case out-of-the-box.

After this is implemented for built-in types (prototype here), using them becomes as easy as format!("The type of this value is {}", usize::type_input()) which gives "The type of this value is int".

Inspection API

This section consists of creating an API to represent Python objects.

The main entry point for users would be the InspectClass trait (simplified, prototype here):

pub trait InspectClass {
    fn inspect() -> ClassInfo;
}

A similar trait would be created for modules, so it becomes possible to access the list of classes in a module. This requires creating a structure for each Python language element (ModuleInfo, ClassInfo, FieldInfo, ArgumentInfo…, prototype here).

At this point, using this API would require instantiating all structures by hand.

Compile-time generation

Proc-macros can statically generate all information needed to automatically implement the inspection API: structural information (fields, etc) are already known, and type information can simply be delegated to the IntoPy and FromPyObject traits, since all parameters and return values must implement at least one of them.

Various prototypes:

https://github.com/PyO3/pyo3/pull/2447/commits/38f0a5928ffb1c2a41f6b98529f6c964a959b082: extract classes
https://github.com/PyO3/pyo3/pull/2447/commits/56b85cfbbd3997027cf020a6dbc7f4a62a4b5403: extract the list of functions
https://github.com/PyO3/pyo3/pull/2447/commits/8125521967a171b10aa8e467323783a8d277428f: extract a function's kind (function, class method, static method…)
https://github.com/PyO3/pyo3/pull/2447/commits/4070ad42904a250cb910758bd40b444274e04ed0: extract the function's return type,
https://github.com/PyO3/pyo3/pull/2447/commits/53f2e941030aa3c8bab22758570a0f4465eddfe3: extract attributes annotated with #[pyo3(get, set)],
https://github.com/PyO3/pyo3/pull/2447/commits/003d27572f4d5c036ee759affcced1c2f0a7d62a: extract argument names and type

This is done via two new traits, InspectStruct, InspectImpl which respectively contain the information captured from #[pyclass] and #[pymethods]. Due to this, this prototype is not compatible with multiple-pymethods. I do not know whether it is possible to make it compatible in the future.

Python Interface generator

Finally, a small runtime routine can be provided to generate the .pyi file from the compile-time extracted information (prototype here).

Thanks to the previous steps, it is possible to retrieve all information necessary to create a complete typed interface file with no further annotations from a user of the PyO3 library. I think that's pretty much the perfect scenario for this feature, and although it seemed daunting at first, I don't think it's so far fetched now :smile:

The current state of the prototype is described here: https://github.com/PyO3/pyo3/pull/2447#issuecomment-1156564249.

Jun 15 '22 14:06 CLOVIS-AI

@CLOVIS-AI just a ping to say I haven't forgotten about this; have been ill / busy and was just about cleared through the backlog enough to read this when #2481 came up. I think I need to push through some security releases first, after which my plan was to finish #2302 and then loop back here with us ready to support a syntax for annotations. Sorry for the delay.

Jun 28 '22 07:06 davidhewitt

@davidhewitt don't worry about the full review for now, it's just a prototype. If you have time, please just read this issue and give me your general feedback on the idea. If it seems good with you, I'll be able to start writing a real PR for at least a part of it and we can do a full review then :+1:

Jul 01 '22 08:07 CLOVIS-AI

Hey, so I finally found a moment to sit down and think about this. Thank you for working on this and for having patience with me.

This looks great, I think this is definitely the right way to go. In particular splitting into two traits for the input/output I think is correct.

Some thoughts and questions:

I've wanted something like FromPyObject::type_input for a long time, I think it can be used to implement improved error messages. In particular in PyDowncastError we currently store the type name, e.g. PyString, but with type_input we could do something better (as long as the input is not Any, I guess).
I think it would be nice to feature-gate the inspection API and macro codegen for it - presumably it would be used with a debug build to emit type stubs, and then it wouldn't be needed in the final release build.
To be valid .pyi files they often need to import type dependencies. Do you have a vision how we might be able to make this work?
Imagine users create custom MyDict[K,V] class which would be a generic type (in Python's eyes, the Rust code would potentially just use PyAny). Can we support it with this proposal?
For cases like the above, if we can't support them, we might need mechanisms to load external type stub fragments to combine into the final artifact.

Overall, yes, I'm happy to proceed with this - as a first step I'd suggest we get type_input and type_output merged. We could already use them to improve error messages and also maybe add function signatures in their docstrings. That would buy us some time to figure out the introspection API, which I think will have some complexity.

Jul 07 '22 07:07 davidhewitt

I was thinking of feature-gating the macro generation but not the inspection API itself (so you would be able to construct inspection APIs yourself in all cases, but would need to enable the automatic implementation). I assume that the API itself will not have any significant effect on compile-time, since it's just normal structs. What do you think?

The TypeInfo struct has a module method that returns the name of the module a class is from. When generating a .pyi file, a recursive visit of all types appearing in the file would yield the list of modules to import. This may be too simplistic (and will probably fail if two classes in different modules have the same name), but it should be good enough for most users (for example, in my case, we have a single module anyway).

About custom generics: the approach in this documentation couldn't, but the one in https://github.com/PyO3/pyo3/pull/2490 can (however, it must be through user input, I don't see a way in which the macros could guess that).

Combining external information with the generated ones will be trivial: because the macros will generate an implementation of the inspection API, and the .pyi generation will take an implementation as parameter, users can simply edit the generated implementation before passing it to the .pyi generation.

Jul 10 '22 09:07 CLOVIS-AI

It seems like #2490 will be merged soon. I won't have a lot of time on my hands in the close future, so if someone else wants to help in the meantime, the next big question is the way to represent the program (Python classes, Python methods, Python modules) as Rust structures.

My prototypes are close to solving the problem, except that I'm not a fan of how they deal with modules. The structures themselves seem fine, but the way to convert a #[pymodule] function into them is unclear.

Sep 06 '22 17:09 CLOVIS-AI

Not sure if it relevant here:

I have written a small python script to generate type stubs from pyo3 libraries with doc strings including type annotations (using the :type and :rtype: format). It works on the already built libraries using the Python introspection feature. To use it run python generate_stubs.py MY_PACKAGE MY_FILE.pyi Here is an example of generated stubs.

Oct 10 '22 13:10 Tpt

@Tpt That's great! However if I understand correctly you still have to declare the type twice (first as a Rust type, then as a Python type in the documentation), which is error-prone, and what this issue tries to avoid. I agree that it's already a great step up from the current situation of writing the .pyi entirely manually.

Oct 11 '22 10:10 CLOVIS-AI

@CLOVIS-AI Yes! Exactly. Indeed, avoiding to duplicate types would be much better. I wanted to get something working quickly for now instead of having to enter the auto generation from Rust rabbit hole.

Oct 12 '22 08:10 Tpt

Would love to have this!

Dec 16 '22 21:12 kylecarow

Looking forward to the features!

Jan 04 '23 01:01 fzyzcjy

Hi, I have changed workplace and do not have time to contribute to this project anymore. If someone wants to continue this PR, please feel free to. My prototype is still online, and the outline described here should be good.

Jan 04 '23 09:01 CLOVIS-AI

I'm not experienced enough to help on this, just testifying about my use case it would be of great help. Well in the meantime, I'm going to write the pyi files by hand.

Jan 09 '23 09:01 PierreMardon

I've played around with #2447 in the last few days and I tried to fix some failed tests.

I got stuck at missing IntoPy impl for PyResult, such that currently the macro generates:

<crate::PyResult<&crate::PyAny,> as _pyo3::conversion::IntoPy<_>>::type_output()

Simply providing impl IntoPy for PyResult<T> will not work because there're already have IntoPyCallbackOutput impls.

I was thinking about move type_input()/type_output() into a separated trait,

pub trait WithTypeInfo {
    fn type_output() -> TypeInfo;
    fn type_input() -> TypeInfo;
 }

with impl <T: WithTypeInfo> WithTypeInfo for PyResult<T>. Is is a good idea?

Feb 04 '23 03:02 op8867555

@op8867555 good question, and I'm not sure I can give you the answer easily. The downside of moving into a separate trait is that you might find without specialization this creates a lot of work. Having the methods on the IntoPy trait allows for default implementations for &PyAny.

I think the best answer is - if you're willing to give it a go, please do, and let's see how that works out :)

Feb 07 '23 21:02 davidhewitt

I've tried the separate trait approach, and it solved the PyResult issue I mentioned before. however, I haven't figure out how to support specialization. (e.g. Vec<u8> generates List[int] instead of bytes for now), I've tried to apply this autoref tricks but I didn't get it works with user-created WithTypeInfo impls ^failed.

Also, I've tried embedding field info into PyClassItems^1, this makes multiple pymethods could be supported easily.

Feb 10 '23 17:02 op8867555

(e.g. Vec generates List[int] instead of bytes for now)

Note that this is very much the case in PyO3 that Vec<u8> creates List[int], your annotation is correct 😄

I've tried to apply this autoref tricks but I didn't get it works with user-created WithTypeInfo impls.

This setup may potetentially work:

struct TypeAnnotation<T>(PhantomData<T>);

impl<T> WithTypeInfo for &'_ TypeAnnotation<T> {
    fn type_input() -> TypeInfo { TypeInfo::Any }
    fn type_output() -> TypeInfo { TypeInfo::Any }
}

and specific implementations can then use impl WithTypeInfo for TypeAnnotation<T>. Or maybe there's some context I am missing?

Also, I've tried embedding field info into PyClassItems, this makes multiple pymethods could be supported easily.

Yep that should work fine 👍

Feb 10 '23 21:02 davidhewitt

Note that this is very much the case in PyO3 that Vec<u8> creates List[int], your annotation is correct

Oh, I didn't notice that :sweat_smile: . Are there any other specialization cases PyO3 creates?

This setup may potetentially work:
struct TypeAnnotation<T>(PhantomData<T>);

impl<T> WithTypeInfo for &'_ TypeAnnotation<T> {
    fn type_input() -> TypeInfo { TypeInfo::Any }
    fn type_output() -> TypeInfo { TypeInfo::Any }
}
and specific implementations can then use impl WithTypeInfo for TypeAnnotation<T>. Or maybe there's some context I am missing?

I tried this (with some modification^1) and didn't manage to make it work with user defined datatypes (e.g. provide type annotation for a rust enum like this). There will be an error when trying to provide an impl for a non-pyclass datatype since both WithTypeInfo and TypeAnnotation are defined outside of the crate. Also, there will be a conflict when both impl<T> _ for TypeAnnotation<Vec<T>> and impl _ for TypeAnnotation<Vec<u8>> being provided. It seems an another layer of specialization can't be made this way.

Feb 15 '23 16:02 op8867555

Ugh. So I wonder, do we really need to support specialization?

Feb 16 '23 21:02 davidhewitt

I personally think it is nice to have, but we don't actually need it for now.

Feb 16 '23 22:02 op8867555

@Tpt When using the generate_stubs.py file I get the error: ValueError: The parameter a of rustpythonexample.sum_as_string has no type definition in the function documentation. How do I add type definition to the function signature? Thanks, David

Feb 27 '23 11:02 davidzanger

@Tpt When using the generate_stubs.py file I get the error: ValueError: The parameter a of rustpythonexample.sum_as_string has no type definition in the function documentation. How do I add type definition to the function signature? Thanks, David

The script uses sphinx syntax for type annotation embedded in the function doc comments. Here is an example in pyoxigraph

Feb 27 '23 11:02 Tpt

I believe #2863 would be really helpful here. If completed someone could create an external module (like @Tpt's) without the need for type annotations from the rust doc itself. This would probably be an acceptable solution for me if it was integrated with something like maturin.

Mar 17 '23 04:03 arihant2math

Would anyone like to take another stab at this?

I think this is a particularly valuable addition to PyO3. Automatic creation of type stubs means that static type checkers like pyright will be able to infer module namespaces, and autocomplete functions, classes and methods. This makes it a lot more user-friendly to users who are new to PyO3 and trying the string_sum example.

Jun 11 '23 09:06 thomasaarholt

I agree with this being extremely desirable. Personally, what I will try if I manage to ever invest any time into it is get upstream CPython to accept type annotations in __text_signature__. This gives PyO3 an obvious place to generate type annotations automatically, and then creating a .pyi from those annotations should be trivial.

Jun 11 '23 20:06 davidhewitt

get upstream CPython to accept type annotations in __text_signature__.

In the meantime, what about generating a custom attribute like __pyo3_text_signature__? Would it be possible/a sensible approach?

Jun 12 '23 06:06 Tpt

That's possible and a good suggestion. I think it would take some work - the current "builtin functions" which PyO3 uses don't have any way to add additional attributes on the PyO3 side (as far as I'm aware). So to add __pyo3_text_signature__ we'd first need to implement custom callables.

Jun 12 '23 19:06 davidhewitt

In the meantime, what about generating a custom attribute like __pyo3_text_signature__? Would it be possible/a sensible approach?

Even when/if __text_signature__ lands to CPython it still will be unavailable for older CPython versions, so the transition could be painful. Could pyO3 allow users writing __text_signature__ with type annotations, parse it and remove annotations for CPython versions not supporting it?

Jun 12 '23 19:06 hombit

I expect PyO3's automatically-generated signatures would be able to add annotations or not according to the Python version being built for. As for users writing pyo3(text_signature = "...") by hand, we might be able to parse and strip but I would fear that unreliable and prefer to just let users choose whether to include annotations or not.

Jun 12 '23 20:06 davidhewitt

We've got a hacked up version of this mostly working, if people are interested. It uses a slightly different approach than other approaches I've seen.

We're assigning a _pyi_types dict attribute to each module, which contains data describing the functions/classes/types exposed to python. This is done using custom macros that wrap the existing pyo3 macros. The _pyi_types data can then be picked up by a python (or rust) script and used to generate the pyi file.

For example

trait PyInterface {
    fn pyi_type() -> String;
}

impl PyInterface for String {
    fn pyi_type() -> String {
        "str".to_owned()
    }
}

#[pyi_type("Union[str, float]")] // generates PyInterface impl
#[derive(FromPyObject)]
enum ManyType {
    Str(String),
    Float(f64),
}

#[custom_pyfunction] // populates const data with name, arg params and types, and return type
/// docstring goes here
fn my_func(a: ManyType, b: String) {

}


#[custom_pymodule]
fn my_module(py: Python, m: PyModule) {
    // custom_pymodule sets a _pyi_interface dict on the module

    // custom_wrap_pyfunction inserts data into the _pyi_interface dict based on the types in `my_func`
    m.add_function(custom_wrap_pyfunction!(my_func)?)?;
}

Then in your python script can inspect the types, and generate a .pyi file.

from my_module import _pyi_types
assert _pyi_types["functions"]["my_func"]["return_type"] == None
assert _pyi_types["functions"]["my_func"]["params"] == [("a", "Union[str,float]", ("b", "str")]
assert _pyi_types["functions"]["my_func"]["docstring"] == "docstring goes here"

Advantages:

python version agnostic
plugs in to existing python documentation tooling, such as pdoc
low boiler-plate on the rust side

Disadvantages:

requires the module to be built first
requires the module to be importable to generate types. If you're cross-compiling, this could complicate things
you can easily define custom pyi_types on the rust side that don't map to any valid python type. E.g. UnionZZZ[str, float], which would break your pyi file
requires another tool to be run for the pyi generation

Now that the proof-of-concept works, I have a few questions: Do people want this as a new crate? Or should we try integrate it into pyo3? Other feedback on the design is welcome. If we do decide to integrate this into pyo3, I'm also wondering how we go about making it an opt-in feature.

It's nowhere near production ready yet (especially w.r.t. the error messages you get from the proc macros)

Jun 22 '23 20:06 jmrgibson

@jmrgibson Your approach seems very similar to mine (in the PRs linked in the original comment of this issue). The main differences are:

I integrated the Python types directly in IntoPy & co. This is important, because some Python types are different when we accept them and emit them (e.g. we take Iterable as parameter but return List to follow the Python conventions)
My version generates code that generates types, instead of directly generating types. This has the benefit of solving the type parameter issue: a list generates code that declares a list of whatever it contains, and that type knows what its own value is.
My version had a confusion as to how to export the data to the user, since they are built at runtime. I was thinking of writing a Rust unit test that would generate the file, but it didn't seem that nice a solution. Exposing the functions directly to Python and providing a generate_stubs function in Python seems a much better idea, indeed.

It seems @davidhewitt has a quite promising prototype directly built into CPython. If he hadn't, I would think missing those two approaches (code generation to expose type definitions + runtime Python function to combine them into stubs) would be the cleanest solution.

Jun 26 '23 22:06 CLOVIS-AI

pyo3
pyo3 copied to clipboard

Python Interface (.pyi) generation and runtime inspection

Progress

Summary

Full type information

Inspection API

Compile-time generation

Python Interface generator

pyo3 pyo3 copied to clipboard

Python Interface (.pyi) generation and runtime inspection

Progress

Summary

Full type information

Inspection API

Compile-time generation

Python Interface generator

pyo3
pyo3 copied to clipboard