hax icon indicating copy to clipboard operation
hax copied to clipboard

Add a `RewriteControlFlow` phase (`return`/`continue`/`break`)

Open W95Psp opened this issue 2 years ago • 9 comments

We want a phase that eliminates explicit control flow (say a return or a break) by moving or duplicating code.

We want to rewrite the control flow so that every return, break or continue always appears in a return position in our AST. Returns in return positions can be dropped: e.g. if ... then {return 3;} else {return 5;} can be simplified in if ... then {3} else {5}.

  • [x] rewrite control flow for returns appearing outside of a loop (this was PR #907);
  • [ ] rewrite control flow for return, break and continue within a loop.

See Details below for a lengthy explanation.

Details

Question marks (but also return and friends) used to be transformed in a monadic style.

In this style, computations are promoted to a Result or Option monad, and pure and bind nodes are inserted heavily.

For instance, let's consider the following function:

fn test(x: Option<u8>) -> Option<u8> {
    Some(f(x?) + 4)
}

This would be transformed to this monadic F* function, where (let*) is basically a bind node in the option monad:

let test (x: option u8) : option u8 =
  run_monad (
    let* hoist1:u8 = x in
      Some
      (let hoist2:u8 = f hoist1 in
        let hoist3:u8 = hoist2 + 4 in
        Some hoist3 <: option u8)
      <:
      option (option u8))

While this is a correct translation, lifting everything to a monad and running the monad is a bit heavy.

Instead, the control flow of the Rust function test can be rewritten into:

match x {
   Some(x) => Some(f(x) + 4),
   None => None
}

Rewriting control flow is about pushing ?s to return positions. When ? appears deep in a if/match node itself nested in a sequential expression (e.g. a sequence or a function call), then such rewriting can result in code duplication. Example:

fn fun(cond: bool, x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    let a = if b > 20 { x? + long_code } else { 0 };
    let b = other_long_code;
    Some(a + b + 3)
}

fn fun_rewrite_step(x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    if b > 20 {
        let a = x? + long_code;
        let b = other_long_code;
        Some(a + b + 3)
    } else {
        let b = other_long_code;
        Some(a + b + 3)
    };
}

fn fun_rewrited(x: Option<u8>) -> Option<u8> {
    let b = x + 10;
    if b > 20 {
        match x {
            Some(x) => {
                let a = x? + long_code;
                let b = other_long_code;
                Some(a + b + 3)
            }
            None => None,
        }
    } else {
        let b = other_long_code;
        Some(a + b + 3)
    }
}

W95Psp avatar Dec 20 '23 13:12 W95Psp

PR https://github.com/hacspec/hax/pull/559 made some progress toward this issue, but we need to improve it to close this.

W95Psp avatar Apr 30 '24 07:04 W95Psp

So if I understand well, to complete this issue we need this RewriteControlFlow phase that would come before DropNeedlessReturn and essentially deal with returns inside ifs (potentially duplicating some code).

Let's consider (a simplified version of) the example from the issue description:

fn question_mark(c: bool, x: Option<i32>) -> Option<i32> {
    let a = if c {x? + 10} else {0};
    let b = 20;
    Some(a + b)
}

And an example written with a return inside two ifs:

fn test(c1: bool, c2: bool) -> i32 {
    if c1 {
        if c2 {
            return 1;
        }
    }
    2
}

They should be rewritten respectively as:

fn question_mark_equivalent(c: bool, x: Option<i32>) -> Option<i32> {
    if c {
      let a = x? + 10;
      let b = 20;
      Some(a + b)
    } else {
       let a = 0;
       let b = 20;
       Some(a + b)
    };
    
}
fn test_equivalent(c1: bool, c2: bool) -> i32 {
    if c1 {
        if c2 {
            return 1;
        } else {
            2
        }
    } else {
        2
    }
}

The crucial rule to rewrite like this would be recognizing patterns like:

let lhs = if c then a (else b);
e

and rewrite it to:

if c {
  let lhs = a;
  e
} else {
 let lhs =  b; 
  e
}

In case there is no else originally we would insert it. And this would be done recursively, only if there is a return statement somewhere in one of the branches of the if.

To avoid recursing twice, we would need to pass the following expression top-down in the recursive calls (in an argument) and propagate the information of whether there is a return statement bottom-up (in the return value of the recursive function). We should also take care to avoid inserting an expression after a return statement (otherwise it would not be simplified by drop_needless_returns.

maximebuyse avatar Sep 19 '24 11:09 maximebuyse

We also need another phase (or a second visitor inside drop_needless_returns) to rewrite

let a = return b;
c

as

return b

As they can be nested it needs its own visitor.

maximebuyse avatar Sep 19 '24 15:09 maximebuyse

I thought #907 is only partially implementing this?

franziskuskiefer avatar Sep 25 '24 10:09 franziskuskiefer

I thought #907 is only partially implementing this?

We have a lot of issues around this. My understanding is that #907 implements this one. But it is only a part of this meta-issue: #15 . And the main part that is now missing is #196

maximebuyse avatar Sep 25 '24 10:09 maximebuyse

ok, @W95Psp please clean up and close what's done 🙏🏻

franziskuskiefer avatar Sep 25 '24 10:09 franziskuskiefer

I am designing the part about loops. First attempt:

  • Apply the current RewriteControlFlow phase inside loops bodies but treat break and continue like return. After that, all of them should be in "return position" in the loop body.
  • Add an enum in Rust_primitives.Hax with 3 variants (break, continue and return) each carrying a value (the enum should be polymorphic to allow carrying values of any type)
  • Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of break and continue the value they should contain is the accumulator (loops are translated as a fold). A value in return position without any of these statements should be wrapped in a continue
  • Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a continue, and just pass along a break or return
  • Wrap the whole fold inside a pattern matching to extract the result. in case we find the result wrapped in a break or continue we should just unwrap it. In case we find a return we should return, that means we should also treat this new pattern matching with the current RewrtieControlFlow phase to make sure this return is treated like the others outside loops (of course we can analyze first to know if the body of the loop contains a return or not).

For all this to work well, I suspect that we should treat the body of the loop after phase LocalMutation. It might be better also for the rest of RewriteControlFlow. This means we should also move DropNeedlessReturns but there might be a reason they come in the other order.

maximebuyse avatar Oct 02 '24 15:10 maximebuyse

I am designing the part about loops. First attempt:

* Apply the current `RewriteControlFlow` phase inside loops bodies but treat `break` and `continue` like `return`. After that, all of them should be in "return position" in the loop body.

* Add an enum in `Rust_primitives.Hax` with 3 variants (`break`, `continue` and `return`) each carrying a value (the enum should be polymorphic to allow carrying values of any type)

* Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of `break` and `continue` the value they should contain is the accumulator (loops are translated as a `fold`). A value in return position without any of these statements should be wrapped in a `continue`

* Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a `continue`, and just pass along a `break` or `return`

* Wrap the whole `fold` inside a pattern matching to extract the result. in case we find the result wrapped in a `break` or `continue` we should just unwrap it. In case we find a `return` we should return, that means we should also treat this new pattern matching with the current `RewrtieControlFlow` phase to make sure this `return` is treated like the others outside loops (of course we can analyze first to know if the body of the loop contains a `return` or not).

For all this to work well, I suspect that we should treat the body of the loop after phase LocalMutation. It might be better also for the rest of RewriteControlFlow. This means we should also move DropNeedlessReturns but there might be a reason they come in the other order.

A few additions:

  • I think we should rewrite the control flow in RewriteControlFlow, and also remove everything that may come after a break or continue (like we already do for return). Then LocalMutation will possibly introduce expressions after them that we can reuse to know the value we should pass.
  • We might want, instead of inserting a pattern matching in the loop body, to define a specific fold that would do the pattern matching. We will still need the pattern matching on the result of the loop (in case there is a return, otherwise we could also have a specific fold that would extract the result out of its break or continue wrapper).
  • In case there are continue but no break or return we don't need any wrapping
  • In the case of nested loops, break and continue may contain a label that allows to choose which loop they refer to. I think we should not support that (at least for now).

maximebuyse avatar Oct 03 '24 08:10 maximebuyse

Here is an example of the rewriting:

fn f (v: &[i32]) -> i32 {
    let mut sum = 0;
    for i in v {
        if *i < 0 {
            return 0
        }
        sum += *i;
    }
    sum *= 2;
    sum
}

After the transformation (here with a classic fold but we will define a specific fold that will do this internal pattern matching on the accumulator, and we will also use std::ops::ControlFlow instead of this CF enum):

fn double_sum_eq (v: &[i32]) -> i32 {
    let sum = 0;
    match (
        v.fold(|sum, i| {
            match sum {
                CF::Continue(sum) => {
                    if *i < 0 {
                        CF::Return(0)
                    } else {
                        let sum = sum + *i;
                        CF::Continue(sum)
                    }
                },
                _ => sum
            }
            
        },
        sum
        )
    ) {
        CF::Return(v) => v,
        CF::Break(v) | CF::Continue(v) =>
         {
            let sum = v;
            let sum = sum * 2;
            sum
         }
    }
}

maximebuyse avatar Oct 03 '24 15:10 maximebuyse

@maximebuyse @W95Psp what's the state here?

franziskuskiefer avatar Oct 17 '24 06:10 franziskuskiefer

We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).

@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.

Thus there is two things to do:

  • [ ] clean up / make sure the PR is up-to-date and ready in terms of engine phases (@maximebuyse I think you were not certain about the order of phases?);
  • [ ] write F*.

W95Psp avatar Oct 17 '24 06:10 W95Psp

We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).

@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.

Thus there is two things to do:

* [ ]  clean up / make sure the PR is up-to-date and ready in terms of engine phases ([@maximebuyse](https://github.com/maximebuyse) I think you were not certain about the order of phases?);[ ]  write F*.

I am quite happy with the order of phases now. But for sure there will be some cleanup to do, and we might change a bit the code we emit while writing the f*.

maximebuyse avatar Oct 17 '24 07:10 maximebuyse

This introduced a regression https://github.com/inria-prosecco/circus-green/actions/runs/11510030233

franziskuskiefer avatar Oct 25 '24 14:10 franziskuskiefer

#1056

franziskuskiefer avatar Oct 31 '24 15:10 franziskuskiefer