rustworkx icon indicating copy to clipboard operation
rustworkx copied to clipboard

Consider adding Python annotation support

Open IvanIsCoding opened this issue 3 years ago • 9 comments

What is the expected enhancement?

Users are able to execute mypy to run a static type check in code that uses retworkx.

Python annotations are becoming more popular, and we should consider supporting them. Supporting annotations can help make retworkx more popular in codebases that have adopted mypy. Also, many popular libraries that do not support annotations yet have tickets claiming for them.

Suggested Implementation

Create .pyi stubs containing the type annotations for the existing code. See typeshed for some inspiration on how this is done. Another useful inspiration is orjson, they are another Python library that uses Rust/PyO3 and they also use the .pyi stub approach.

Challenges

  • Duplication: if we use .pyi stubs, we will have to duplicate all the functions signatures in the stubs files; ideally PyO3 would generate all the stubs but https://github.com/PyO3/pyo3/issues/510 is far from being closed
  • Packaging: making sure the stubs are installed and third-party code recognizes it when running mypy
  • Testing: how do we test our stubs are correct and up to date?

Non-Goals

Use mypy internally within retworkx. The emphasis is: we want users to be able to run mypy when checking code that uses retworkx.

IvanIsCoding avatar Jun 06 '21 02:06 IvanIsCoding

I had an idea that can help us avoid duplication and make type annotations manageable. Here is what I thought:

Idea

The idea is to make .pyi stub files the source of text_signature in our Rust . The stub files look like:

class PyGraph:
    
    def has_edge(self, node_a: int, node_b: int, /) -> bool: ...

In Rust, we would load the text signature using a constant to reuse the definition originally in .pyi:

#[pyo3(text_signature = annotation::PYGRAPH_HAS_EDGE)]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {

Possibly we could also use an annotation! procedural macro to make it slightly better:

#[pyo3(text_signature = annotation!("PyGraph.has_edge"))]
pub fn has_edge(&self, node_a: usize, node_b: usize) -> bool {

That way, we do not need two worry about conflicting text signatures and annotations. All annotations would "magically" come from .pyi files, once we update the stub file they propagate to the Rust code.

Implementation

A Possible implementation for the idea is:

1. Create a Python script that parses the .pyi stub and generates text signatures

Leverage Python's ast module to parse .pyi files and output what PyO3 expects.

The text signatures that PyO3 expects are slightly different than what is on the .pyi file. PyO3 expects text_signature = "(self, node_a, node_b, /)" which is stripped out of the argument types and return types.

The ast module has the tools to do that. I think we can parse def has_edge(self, node_a: int, node_b: int, /) -> bool:, remove the items we do not want and then unparse back to a string we can consume.

2. Put the generated signatures as constants in a .rs

Next, the Python script needs to dump the signatures into a file Rust can consume. That would be an autogenerated .rs with all the signatures as constants. We'd export the constants under crate::annotation and reuse them in other parts of the code.

A detail for this step is that we'd need to add a CI check to verify that the autogenerated file is up to date. I think it is better to have the autogenerated file on Git because that way we do not need to add the Python script to the build process.

3. Create the macro annotation! macro to load the signatures

This is where some additional magic could happen. It could parse the tokens and convert things like PyGraph.has_edge into crate::annotation::PYGRAPH_HAS_EDGE. The former is closer to the Python definition, so it could bring some value.

IvanIsCoding avatar Aug 01 '21 20:08 IvanIsCoding

Sorry for coming late into this, especially since you have done a lot of work in #401.

We can leverage the procedural attribute macros of Rust [1] together with the parsing capabilities of the syn crate to automatically generate the .pyi stubs. This means that during compilation we will generate the stubs and we can do it conditionally by utilizing cfg_attr [2]. The idea is as follows:

  1. Define our own procedural macro where we parse the input as an ItemFn [3] and retrieve the name of the function, its arguments (name + type) and output type.
     extern crate proc_macro;
     use proc_macro::TokenStream;
     use syn;
    
     #[proc_macro_attribute]
     pub fn pyhints(attr: TokenStream, item: TokenStream) -> TokenStream {
         let input: syn::ItemFn = syn::parse(item.clone()).unwrap();
         let sig = input.sig;
         ....
    
         item
     }
    
  2. Define the mapping from Rust types to Python types.
  3. Some functions take in a PyObject but they really require a Callable. This information can be passed in the macro as an attribute.
  4. Dump all these in .pyi files with the correct syntax.

In our rust code we will write something like:

#[pyfunction]
#[cfg_attr(feature = "mypy", pyhints)]
pub fn algo(graph: PyGraph, node: usize) -> usize { ... }

The downside is that this approach does not work for the functions defined in __init__.py and we will have to do them manually. It might also be hard to do this for our custom iterators too.

georgios-ts avatar Oct 01 '21 21:10 georgios-ts

Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:

  1. Define a custom trait like

     trait ToPyHint {
          fn convert() -> String;
     }
    

    that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.

  2. Implement a proc macro that receives a function definition:

    • retrieves the name of the function, its arguments (name + type) and output type.
    • emits another function marked as a #[test]. Inside the body of this new function, we'll call ToPyHint::convert() for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.
  3. Run the tests with cargo, collect the output and dump the lines of interest in a new .pyi file.

I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).

georgios-ts avatar Oct 10 '21 12:10 georgios-ts

Coming back with a more detailed approach (works for functions but not for classes) that is slightly different from what I had in mind in the previous comment:

  1. Define a custom trait like

     trait ToPyHint {
          fn convert() -> String;
     }
    

    that gives the description (just a string) of the equivalent python type for rust types that we use in retworkx as arguments in pyfunctions.

  2. Implement a proc macro that receives a function definition:

    • retrieves the name of the function, its arguments (name + type) and output type.
    • emits another function marked as a #[test]. Inside the body of this new function, we'll call ToPyHint::convert() for the types that appear in the original func signature, format everything according to the stub files syntax and print in in stdout.
  3. Run the tests with cargo, collect the output and dump the lines of interest in a new .pyi file.

I have an initial working implementation https://github.com/georgios-ts/stubgen (it's not fully ready to use it in retworkx).

This is a remarkable progress! I think if you post it on the PyO3 thread they would be very interested. You have the best idea/proof-of-concept code since the debate started.

I did not have time to post earlier, but the main challenge is with PyAny. PyO3 erases information when we use PyAny in Rust, and recovering it is not easy. I will take some examples from the PR, I have not figured out yet how to generate this from the Rust code:

class PyDiGraph(Generic[S, T]): 
    def compose(node_map_func: Optional[Callable[[S], int]]): ...

We can try replacing it with Any, but doing type checking wouldn't be as useful. I wonder if there is a meet-in-the-middle solution between parsing the Rust code and checking it against a reference .pyi file.

IvanIsCoding avatar Oct 11 '21 06:10 IvanIsCoding

Yeah, that's true, using PyAny / PyObject is convenient but we lose type information. Apart from manually editing .pyi files, one solution might be to define thin wrappers around a PyObject that correspond to a known type to us that we can't really tell to pyo3 and use these in our pyfunctions.

For example, as a weight function in the shortest path algorithms, we could define

struct EdgeWeightFunc(PyObject);

impl ToPyHint for EdgeWeightFunc {
      fn convert() -> String {
        String::from("Callable[[EWeight], float]")
      }
}

with the convention that:

impl ToPyHint for PyDiGraph {
      fn convert() -> String {
        String::from("PyDiGraph[NWeight, EWeight]")
      }
}

and in .pyi file we define:

NWeight = TypeVar('NWeight')
EWeight = TypeVar('EWeight')

Admittedly, it's getting a bit ugly.

georgios-ts avatar Oct 11 '21 08:10 georgios-ts

Just to play devil's advocate: would it be that bad to manually maintain .pyi files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.

adriangb avatar Nov 27 '21 08:11 adriangb

Just to play devil's advocate: would it be that bad to manually maintain .pyi files? Writing them the first time can be painful, but it can be gradual. And if you did run mypy on your codebase, you can test that the annotations are correct, so they aren't manual in the sense that you have to make sure the codebase doesn't drift away from them, even if you do still have to fix them by hand.

It is not bad to maintain .pyi files by hand, at least for the frequent contributors and end-users. #401 even has a file that indicates the annotations are partial, so if someone finds a mismatch or a missing annotation they can just write a local file with the correct annotation and mypy will use that instead.

Maintaining .pyi files by hand is particularly bad for first-time contributors though. They already need to write code and create test cases, which is a lot. If we ask them to modify .pyi files too, that would be an extra task they have to do. Also, they are the most likely to forget to update those.

So if we had autogenerated annotations, it would make it easier to contribute & also keep the .pyis up to date

IvanIsCoding avatar Dec 02 '21 02:12 IvanIsCoding

they are the most likely to forget to update those

Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.

adriangb avatar Dec 02 '21 02:12 adriangb

they are the most likely to forget to update those

Would running MyPy on your tests help? That way if they forget they get an explicit error saying type annotations are missing for xyz method.

Running mypy on tests is not as helpful as it seems, it generally doesn't cover all cases. We're planning on going with specific tests for annotations see https://github.com/Qiskit/retworkx/blob/546f6ee9e8d1faaec7879829f8de0f85ca2b5b68/stubs-tests/pygraph_test.py for an example

IvanIsCoding avatar Dec 02 '21 02:12 IvanIsCoding

This effort was concluded by #1061, now that we are enforcing that new changes will need to add type annotations.

IvanIsCoding avatar Jan 21 '24 15:01 IvanIsCoding