pyrs icon indicating copy to clipboard operation
pyrs copied to clipboard

Using pyrs with monkeytype for type inference

Open flip111 opened this issue 3 years ago • 0 comments

Hello, i would like to show the results i obtained using pyrs together with monkeytype and the process i followed.

These are the first commands i used:

cd typing
echo 'layout python3' > .envrc
direnv allow
git clone https://github.com/chonyy/fpgrowth_py.git
git clone https://github.com/Instagram/MonkeyType.git
git clone https://github.com/konchunas/pyrs.git
cd MonkeyType
python3 -m pip install -e .
cd ../fpgrowth_py
monkeytype run run.py
monkeytype list-modules # optional step that shows what `apply` accepts
monkeytype apply fpgrowth_py.utils

git clone https://github.com/chonyy/fpgrowth_py.git is here the project that is being converted but it might be any project.

After that i ran

monkeytype apply fpgrowth_py.fpgrowth

But i ran into a problem

  File "/home/flip111/typing/fpgrowth_py/fpgrowth_py/utils.py", line 7, in Node
    def __init__(self, itemName: str, frequency: int, parentNode: Optional[Node]) -> None:
NameError: name 'Node' is not defined

I solved this by temporarily removing the type definition part : Optional[Node], then running monkeytype again and then putting the type definition back.

After that it was time for pyrs

python3 -m pyrs fpgrowth_py/fpgrowth_py/fpgrowth.py > fpgrowth_py/fpgrowth_py/fpgrowth.rs
python3 -m pyrs fpgrowth_py/fpgrowth_py/utils.py > fpgrowth_py/fpgrowth_py/utils.rs
rustfmt fpgrowth_py/fpgrowth_py/fpgrowth.rs
rustfmt fpgrowth_py/fpgrowth_py/utils.rs

rustfmt then complains

» rustfmt fpgrowth_py/fpgrowth_py/utils.rs                                                                                                                   6 files, 435 ins.(+), 268 del.(-)  [14:48:38]
error: unexpected closing delimiter: `}`
  --> /home/flip111/typing/fpgrowth_py/fpgrowth_py/utils.rs:48:1
   |
34 | fn getFromFile<T0, RT>(fname: T0) -> RT {
   |                                         - this opening brace...
...
46 | }
   | - ...matches this closing brace
47 | return (itemSetList, frequency);
48 | }
   | ^ unexpected closing delimiter

Because python source code

def getFromFile(fname):
    itemSetList = []
    frequency = []
    
    with open(fname, 'r') as file:
        csv_reader = reader(file)
        for line in csv_reader:
            line = list(filter(None, line))
            itemSetList.append(line)
            frequency.append(1)

    return itemSetList, frequency

got translated into (i indented this for convenience of reading this post)

fn getFromFile<T0, RT>(fname: T0) -> RT {
    let mut itemSetList = vec![];
    let mut frequency = vec![];
    // with!(open(fname, "r") as file) //unsupported
    {
        let csv_reader = reader(file);
        }
        for line in csv_reader {
            line = line.into_iter().filter(None).collect::<Vec<_>>();
            itemSetList.push(line);
            frequency.push(1);
        }
    }
    return (itemSetList, frequency);
}

There is the unsupported with language construct together with a file open. Also one closing bracket got introduced after let csv_reader = reader(file); for some reason. I manually fixed this into

fn getFromFile<T0, RT>(fname: T0) -> RT {
    let mut itemSetList = vec![];
    let mut frequency = vec![];
    // with!(open(fname, "r") as file) //unsupported
    if let Ok(file) = std::fs::File::open(fname) {
        let csv_reader = reader(file);
        for line in csv_reader {
            line = line.into_iter().filter(None).collect::<Vec<_>>();
            itemSetList.push(line);
            frequency.push(1);
        }
    }
    return (itemSetList, frequency);
}

I previously ran this process without the monkeytype step. After this i was able to make a diff of the resulting rust source code. Here is a diff of the pygrowth.rs file which shows also in Rust there are a lot more concrete types available

9,11c9,11
<     itemName: ST0,
<     count: ST1,
<     parent: ST2,
---
>     itemName: &str,
>     count: i32,
>     parent: Option<Node>,
17c17
<     fn __init__<T0, T1, T2>(&self, itemName: T0, frequency: T1, parentNode: T2) {
---
>     fn __init__(&self, itemName: &str, frequency: i32, parentNode: Option<Node>) {
24c24
<     fn increment<T0>(&self, frequency: T0) {
---
>     fn increment(&self, frequency: i32) {
54c54,58
< fn constructTree<T0, T1, T2, RT>(itemSetList: T0, frequency: T1, minSup: T2) -> RT {
---
> fn constructTree(
>     itemSetList: Vec<Union<Any, Vec<&str>>>,
>     frequency: Vec<Union<Any, i32>>,
>     minSup: f32,
> ) -> Union<(None, None), (Node, HashMap<&str, Vec<Union<i32, Node>>>)> {
92c96,103
< fn updateHeaderTable<T0, T1, T2>(item: T0, targetNode: T1, headerTable: T2) {
---
> fn updateHeaderTable(
>     item: &str,
>     targetNode: Node,
>     headerTable: HashMap<
>         &str,
>         Union<Vec<Option<i32>>, Vec<Option<Union<i32, Node>>>, Vec<Union<i32, Node>>>,
>     >,
> ) {
103c114,122
< fn updateTree<T0, T1, T2, T3, RT>(item: T0, treeNode: T1, headerTable: T2, frequency: T3) -> RT {
---
> fn updateTree(
>     item: &str,
>     treeNode: Node,
>     headerTable: HashMap<
>         &str,
>         Union<Vec<Option<i32>>, Vec<Option<Union<i32, Node>>>, Vec<Union<i32, Node>>>,
>     >,
>     frequency: i32,
> ) -> Node {
113c132
< fn ascendFPtree<T0, T1>(node: T0, prefixPath: T1) {
---
> fn ascendFPtree(node: Node, prefixPath: Vec<Union<Any, &str>>) {
119c138,141
< fn findPrefixPath<T0, T1, RT>(basePat: T0, headerTable: T1) -> RT {
---
> fn findPrefixPath(
>     basePat: &str,
>     headerTable: HashMap<&str, Vec<Union<i32, Node>>>,
> ) -> Union<(Vec<Any>, Vec<Any>), (Vec<Vec<&str>>, Vec<i32>)> {
134c156,161
< fn mineTree<T0, T1, T2, T3>(headerTable: T0, minSup: T1, preFix: T2, freqItemList: T3) {
---
> fn mineTree(
>     headerTable: HashMap<&str, Vec<Union<i32, Node>>>,
>     minSup: f32,
>     preFix: Set<&str>,
>     freqItemList: Vec<Union<Set<&str>, Any>>,
> ) {
151c178
< fn powerset<T0, RT>(s: T0) -> RT {
---
> fn powerset(s: Set<&str>) -> chain {
159c186
< fn getSupport<T0, T1, RT>(testSet: T0, itemSetList: T1) -> RT {
---
> fn getSupport(testSet: Union<Set<&str>, (&str)>, itemSetList: Vec<Vec<&str>>) -> i32 {
168c195,199
< fn associationRule<T0, T1, T2, RT>(freqItemSet: T0, itemSetList: T1, minConf: T2) -> RT {
---
> fn associationRule(
>     freqItemSet: Vec<Set<&str>>,
>     itemSetList: Vec<Vec<&str>>,
>     minConf: f32,
> ) -> Vec<Vec<Union<Set<&str>, f32>>> {
182c213
< fn getFrequencyFromList<T0, RT>(itemSetList: T0) -> RT {
---
> fn getFrequencyFromList(itemSetList: Vec<Vec<&str>>) -> Vec<i32> {

After this i copied the two new source files into a new project

cargo new fpgrowth_rs
cp fpgrowth_py/fpgrowth_py/fpgrowth.rs fpgrowth_rs/src
cp fpgrowth_py/fpgrowth_py/utils.rs fpgrowth_rs/src
cd fpgrowth_rs

I added an import for fpgrowth into src/main.rs

mod fpgrowth;

fn main() {
    println!("Hello, world!");
}

I then tried to fix the source files with clippy

cargo clippy --fix --allow-dirty

Clippy reported the following errors

» cargo clippy --fix --allow-dirty
    Checking fpgrowth_rs v0.1.0 (/home/flip111/typing/fpgrowth_rs)
error[E0433]: failed to resolve: use of undeclared crate or module `fpgrowth_py`
 --> src/fpgrowth.rs:6:5
  |
6 | use fpgrowth_py::utils::*;
  |     ^^^^^^^^^^^ use of undeclared crate or module `fpgrowth_py`
  |
help: there is a crate or module with a similar name
  |
6 | use fpgrowth::utils::*;
  |     ~~~~~~~~

error[E0432]: unresolved imports `collections::defaultdict`, `collections::OrderedDict`
 --> src/fpgrowth.rs:4:19
  |
4 | use collections::{defaultdict, OrderedDict};
  |                   ^^^^^^^^^^^  ^^^^^^^^^^^ no `OrderedDict` in `collections`
  |                   |
  |                   no `defaultdict` in `collections`

error[E0432]: unresolved import `csv`
 --> src/fpgrowth.rs:5:5
  |
5 | use csv::reader;
  |     ^^^ use of undeclared crate or module `csv`

error[E0432]: unresolved import `itertools`
 --> src/fpgrowth.rs:7:5
  |
7 | use itertools::{chain, combinations};
  |     ^^^^^^^^^ use of undeclared crate or module `itertools`

error[E0432]: unresolved import `optparse`
 --> src/fpgrowth.rs:8:5
  |
8 | use optparse::OptionParser;
  |     ^^^^^^^^ use of undeclared crate or module `optparse`

error[E0412]: cannot find type `Set` in this scope
  --> src/fpgrowth.rs:14:11
   |
14 | ) -> (Vec<Set<&str>>, Vec<Vec<Union<Set<&str>, f32>>>) {
   |           ^^^ not found in this scope

error[E0412]: cannot find type `Union` in this scope
  --> src/fpgrowth.rs:14:31
   |
14 | ) -> (Vec<Set<&str>>, Vec<Vec<Union<Set<&str>, f32>>>) {
   |                               ^^^^^ not found in this scope
   |
help: consider importing one of these items
   |
1  | use crate::fpgrowth::collections::btree_set::Union;
   |
1  | use crate::fpgrowth::collections::hash_set::Union;
   |
1  | use std::collections::btree_set::Union;
   |
1  | use std::collections::hash_set::Union;
   |

error[E0412]: cannot find type `Set` in this scope
  --> src/fpgrowth.rs:14:37
   |
14 | ) -> (Vec<Set<&str>>, Vec<Vec<Union<Set<&str>, f32>>>) {
   |                                     ^^^ not found in this scope

error[E0425]: cannot find function `getFrequencyFromList` in this scope
  --> src/fpgrowth.rs:15:21
   |
15 |     let frequency = getFrequencyFromList(itemSetList);
   |                     ^^^^^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `constructTree` in this scope
  --> src/fpgrowth.rs:17:33
   |
17 |     let (fpTree, headerTable) = constructTree(itemSetList, frequency, minSup);
   |                                 ^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `mineTree` in this scope
  --> src/fpgrowth.rs:22:9
   |
22 |         mineTree(headerTable, minSup, set(), freqItems);
   |         ^^^^^^^^ not found in this scope

error[E0425]: cannot find function `set` in this scope
  --> src/fpgrowth.rs:22:39
   |
22 |         mineTree(headerTable, minSup, set(), freqItems);
   |                                       ^^^ not found in this scope

error[E0425]: cannot find function `associationRule` in this scope
  --> src/fpgrowth.rs:23:21
   |
23 |         let rules = associationRule(freqItems, itemSetList, minConf);
   |                     ^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `getFromFile` in this scope
  --> src/fpgrowth.rs:28:36
   |
28 |     let (itemSetList, frequency) = getFromFile(fname);
   |                                    ^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `constructTree` in this scope
  --> src/fpgrowth.rs:30:33
   |
30 |     let (fpTree, headerTable) = constructTree(itemSetList, frequency, minSup);
   |                                 ^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `mineTree` in this scope
  --> src/fpgrowth.rs:35:9
   |
35 |         mineTree(headerTable, minSup, set(), freqItems);
   |         ^^^^^^^^ not found in this scope

error[E0425]: cannot find function `set` in this scope
  --> src/fpgrowth.rs:35:39
   |
35 |         mineTree(headerTable, minSup, set(), freqItems);
   |                                       ^^^ not found in this scope

error[E0425]: cannot find function `associationRule` in this scope
  --> src/fpgrowth.rs:36:21
   |
36 |         let rules = associationRule(freqItems, itemSetList, minConf);
   |                     ^^^^^^^^^^^^^^^ not found in this scope

warning: unused import: `std::collections::HashMap`
 --> src/fpgrowth.rs:1:5
  |
1 | use std::collections::HashMap;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unnecessary parentheses around assigned value
  --> src/fpgrowth.rs:16:18
   |
16 |     let minSup = (itemSetList.len() * minSupRatio);
   |                  ^                               ^
   |
   = note: `#[warn(unused_parens)]` on by default
help: remove these parentheses
   |
16 -     let minSup = (itemSetList.len() * minSupRatio);
16 +     let minSup = itemSetList.len() * minSupRatio;
   | 

warning: unnecessary parentheses around assigned value
  --> src/fpgrowth.rs:29:18
   |
29 |     let minSup = (itemSetList.len() * minSupRatio);
   |                  ^                               ^
   |
help: remove these parentheses
   |
29 -     let minSup = (itemSetList.len() * minSupRatio);
29 +     let minSup = itemSetList.len() * minSupRatio;
   | 

error[E0277]: cannot multiply `usize` by `f32`
  --> src/fpgrowth.rs:16:37
   |
16 |     let minSup = (itemSetList.len() * minSupRatio);
   |                                     ^ no implementation for `usize * f32`
   |
   = help: the trait `std::ops::Mul<f32>` is not implemented for `usize`

error[E0308]: mismatched types
  --> src/fpgrowth.rs:31:23
   |
27 |   fn fpgrowthFromFile<T0, T1, T2, RT>(fname: T0, minSupRatio: T1, minConf: T2) -> RT {
   |                                   -- this type parameter
...
31 |       if fpTree == None {
   |  _______________________^
32 | |         println!("{:?} ", "No frequent item set");
33 | |     } else {
   | |_____^ expected type parameter `RT`, found `()`
   |
   = note: expected type parameter `RT`
                   found unit type `()`

error[E0308]: mismatched types
  --> src/fpgrowth.rs:37:16
   |
27 | fn fpgrowthFromFile<T0, T1, T2, RT>(fname: T0, minSupRatio: T1, minConf: T2) -> RT {
   |                                 -- this type parameter                          -- expected `RT` because of return type
...
37 |         return (freqItems, rules);
   |                ^^^^^^^^^^^^^^^^^^ expected type parameter `RT`, found tuple
   |
   = note: expected type parameter `RT`
                       found tuple `(std::vec::Vec<_>, _)`

Some errors have detailed explanations: E0277, E0308, E0412, E0425, E0432, E0433.
For more information about an error, try `rustc --explain E0277`.
warning: `fpgrowth_rs` (bin "fpgrowth_rs" test) generated 3 warnings
error: could not compile `fpgrowth_rs` due to 21 previous errors; 3 warnings emitted
warning: build failed, waiting for other jobs to finish...
warning: `fpgrowth_rs` (bin "fpgrowth_rs") generated 3 warnings (3 duplicates)
error: build failed

I have yet to inspect these errors and figure out whether they best be fixed before or after using pyrs


Conclusion:

  • The experiment was nice to do.
  • monkeytype seems promising for infering types.
  • c2rust formatter tool might be helpful as well here, but didn't try https://github.com/immunant/c2rust/tree/master/c2rust-refactor https://c2rust.com/manual/c2rust-refactor/commands.html
  • still manual work todo, python things like itertools don't yet get translated to their rust equivelant

flip111 avatar Dec 31 '21 14:12 flip111