rustworkx
rustworkx copied to clipboard
Consider adding Python annotation support
What is the expected enhancement?
Users are able to execute mypy to run a static type check in code that uses retworkx.
Python annotations are becoming more popular, and we should consider supporting them. Supporting annotations can help make retworkx more popular in codebases that have adopted mypy. Also, many popular libraries that do not support annotations yet have tickets claiming for them.
Suggested Implementation
Create .pyi
stubs containing the type annotations for the existing code. See typeshed for some inspiration on how this is done. Another useful inspiration is orjson, they are another Python library that uses Rust/PyO3 and they also use the .pyi
stub approach.
Challenges
- Duplication: if we use
.pyi
stubs, we will have to duplicate all the functions signatures in the stubs files; ideally PyO3 would generate all the stubs but https://github.com/PyO3/pyo3/issues/510 is far from being closed - Packaging: making sure the stubs are installed and third-party code recognizes it when running mypy
- Testing: how do we test our stubs are correct and up to date?
Non-Goals
Use mypy internally within retworkx. The emphasis is: we want users to be able to run mypy when checking code that uses retworkx.
I had an idea that can help us avoid duplication and make type annotations manageable. Here is what I thought:
Idea
The idea is to make .pyi
stub files the source of text_signature
in our Rust . The stub files look like:
class PyGraph:
def has_edge(self, node_a: int, node_b: int, /) -> bool: ...
In Rust, we would load the text signature using a constant to reuse the definition originally in .pyi
:
#[pyo3(text_signature = annotation::PYGRAPH_HAS_EDGE)]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {
Possibly we could also use an annotation!
procedural macro to make it slightly better:
#[pyo3(text_signature = annotation!("PyGraph.has_edge"))]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {
That way, we do not need two worry about conflicting text signatures and annotations. All annotations would "magically" come from .pyi
files, once we update the stub file they propagate to the Rust code.
Implementation
A Possible implementation for the idea is:
1. Create a Python script that parses the .pyi
stub and generates text signatures
Leverage Python's ast
module to parse .pyi
files and output what PyO3 expects.
The text signatures that PyO3 expects are slightly different than what is on the .pyi
file. PyO3 expects text_signature = "(self, node_a, node_b, /)"
which is stripped out of the argument types and return types.
The ast module has the tools to do that. I think we can parse def has_edge(self, node_a: int, node_b: int, /) -> bool:
, remove the items we do not want and then unparse back to a string we can consume.
2. Put the generated signatures as constants in a .rs
Next, the Python script needs to dump the signatures into a file Rust can consume. That would be an autogenerated .rs
with all the signatures as constants. We'd export the constants under crate::annotation
and reuse them in other parts of the code.
A detail for this step is that we'd need to add a CI check to verify that the autogenerated file is up to date. I think it is better to have the autogenerated file on Git because that way we do not need to add the Python script to the build process.
3. Create the macro annotation!
macro to load the signatures
This is where some additional magic could happen. It could parse the tokens and convert things like PyGraph.has_edge
into crate::annotation::PYGRAPH_HAS_EDGE
. The former is closer to the Python definition, so it could bring some value.
Sorry for coming late into this, especially since you have done a lot of work in #401.
We can leverage the procedural attribute macros of Rust [1] together with the parsing capabilities of the syn crate to automatically generate the .pyi
stubs. This means that during compilation we will generate the stubs and we can do it conditionally by utilizing cfg_attr
[2].
The idea is as follows:
- Define our own procedural macro where we parse the input as an
ItemFn
[3] and retrieve the name of the function, its arguments (name + type) and output type.extern crate proc_macro; use proc_macro::TokenStream; use syn; #[proc_macro_attribute] pub fn pyhints(attr: TokenStream, item: TokenStream) -> TokenStream { let input: syn::ItemFn = syn::parse(item.clone()).unwrap(); let sig = input.sig; .... item }
- Define the mapping from Rust types to Python types.
- Some functions take in a
PyObject
but they really require aCallable
. This information can be passed in the macro as an attribute. - Dump all these in
.pyi
files with the correct syntax.
In our rust code we will write something like:
#[pyfunction]
#[cfg_attr(feature = "mypy", pyhints)]
pub fn algo(graph: PyGraph, node: usize) -> usize { ... }
The downside is that this approach does not work for the functions defined in __init__.py
and we will have to do them manually. It might also be hard to do this for our custom iterators too.
Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:
-
Define a custom trait like
trait ToPyHint { fn convert() -> String; }
that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.
-
Implement a proc macro that receives a function definition:
- retrieves the name of the function, its arguments (name + type) and output type.
- emits another function marked as a
#[test]
. Inside the body of this new function, we'll callToPyHint::convert()
for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.
-
Run the tests with cargo, collect the output and dump the lines of interest in a new
.pyi
file.
I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).
Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:
Define a custom trait like
trait ToPyHint { fn convert() -> String; }
that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.
Implement a proc macro that receives a function definition:
- retrieves the name of the function, its arguments (name + type) and output type.
- emits another function marked as a
#[test]
. Inside the body of this new function, we'll callToPyHint::convert()
for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.Run the tests with cargo, collect the output and dump the lines of interest in a new
.pyi
file.I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).
This is a remarkable progress! I think if you post it on the PyO3 thread they would be very interested. You have the best idea/proof-of-concept code since the debate started.
I did not have time to post earlier, but the main challenge is with PyAny
. PyO3 erases information when we use PyAny
in Rust, and recovering it is not easy. I will take some examples from the PR, I have not figured out yet how to generate this from the Rust code:
class PyDiGraph(Generic[S, T]):
def compose(node_map_func: Optional[Callable[[S], int]]): ...
We can try replacing it with Any
, but doing type checking wouldn't be as useful. I wonder if there is a meet-in-the-middle solution between parsing the Rust code and checking it against a reference .pyi
file.
Yeah, that's true, using PyAny
/ PyObject
is convenient but we lose type information. Apart from manually editing .pyi
files, one solution might be to define thin wrappers around a PyObject
that correspond to a known type to us that we can't really tell to pyo3 and use these in our pyfunctions.
For example, as a weight function in the shortest path algorithms, we could define
struct EdgeWeightFunc(PyObject);
impl ToPyHint for EdgeWeightFunc {
fn convert() -> String {
String::from("Callable[[EWeight], float]")
}
}
with the convention that:
impl ToPyHint for PyDiGraph {
fn convert() -> String {
String::from("PyDiGraph[NWeight, EWeight]")
}
}
and in .pyi
file we define:
NWeight = TypeVar('NWeight')
EWeight = TypeVar('EWeight')
Admittedly, it's getting a bit ugly.
Just to play devil's advocate: would it be that bad to manually maintain .pyi
files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.
Just to play devil's advocate: would it be that bad to manually maintain
.pyi
files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.
It is not bad to maintain .pyi
files by hand, at least for the frequent contributors and end-users. #401 even has a file that indicates the annotations are partial, so if someone finds a mismatch or a missing annotation they can just write a local file with the correct annotation and mypy
will use that instead.
Maintaining .pyi
files by hand is particularly bad for first-time contributors though. They already need to write code and create test cases, which is a lot. If we ask them to modify .pyi
files too, that would be an extra task they have to do. Also, they are the most likely to forget to update those.
So if we had autogenerated annotations, it would make it easier to contribute & also keep the .pyi
s up to date
they are the most likely to forget to update those
Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.
they are the most likely to forget to update those
Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.
Running mypy
on tests is not as helpful as it seems, it generally doesn't cover all cases. We're planning on going with specific tests for annotations see https://github.com/Qiskit/retworkx/blob/546f6ee9e8d1faaec7879829f8de0f85ca2b5b68/stubs-tests/pygraph_test.py for an example
This effort was concluded by #1061, now that we are enforcing that new changes will need to add type annotations.