Add a `RewriteControlFlow` phase (`return`/`continue`/`break`)
We want a phase that eliminates explicit control flow (say a return or a break) by moving or duplicating code.
We want to rewrite the control flow so that every return, break or continue always appears in a return position in our AST. Returns in return positions can be dropped: e.g. if ... then {return 3;} else {return 5;} can be simplified in if ... then {3} else {5}.
- [x] rewrite control flow for
returns appearing outside of a loop (this was PR #907); - [ ] rewrite control flow for
return,breakandcontinuewithin a loop.
See Details below for a lengthy explanation.
Details
Question marks (but also return and friends) used to be transformed in a monadic style.
In this style, computations are promoted to a Result or Option monad, and pure and bind nodes are inserted heavily.
For instance, let's consider the following function:
fn test(x: Option<u8>) -> Option<u8> {
Some(f(x?) + 4)
}
This would be transformed to this monadic F* function, where (let*) is basically a bind node in the option monad:
let test (x: option u8) : option u8 =
run_monad (
let* hoist1:u8 = x in
Some
(let hoist2:u8 = f hoist1 in
let hoist3:u8 = hoist2 + 4 in
Some hoist3 <: option u8)
<:
option (option u8))
While this is a correct translation, lifting everything to a monad and running the monad is a bit heavy.
Instead, the control flow of the Rust function test can be rewritten into:
match x {
Some(x) => Some(f(x) + 4),
None => None
}
Rewriting control flow is about pushing ?s to return positions. When ? appears deep in a if/match node itself nested in a sequential expression (e.g. a sequence or a function call), then such rewriting can result in code duplication. Example:
fn fun(cond: bool, x: Option<u8>) -> Option<u8> {
let b = x + 10;
let a = if b > 20 { x? + long_code } else { 0 };
let b = other_long_code;
Some(a + b + 3)
}
fn fun_rewrite_step(x: Option<u8>) -> Option<u8> {
let b = x + 10;
if b > 20 {
let a = x? + long_code;
let b = other_long_code;
Some(a + b + 3)
} else {
let b = other_long_code;
Some(a + b + 3)
};
}
fn fun_rewrited(x: Option<u8>) -> Option<u8> {
let b = x + 10;
if b > 20 {
match x {
Some(x) => {
let a = x? + long_code;
let b = other_long_code;
Some(a + b + 3)
}
None => None,
}
} else {
let b = other_long_code;
Some(a + b + 3)
}
}
PR https://github.com/hacspec/hax/pull/559 made some progress toward this issue, but we need to improve it to close this.
So if I understand well, to complete this issue we need this RewriteControlFlow phase that would come before DropNeedlessReturn and essentially deal with returns inside ifs (potentially duplicating some code).
Let's consider (a simplified version of) the example from the issue description:
fn question_mark(c: bool, x: Option<i32>) -> Option<i32> {
let a = if c {x? + 10} else {0};
let b = 20;
Some(a + b)
}
And an example written with a return inside two ifs:
fn test(c1: bool, c2: bool) -> i32 {
if c1 {
if c2 {
return 1;
}
}
2
}
They should be rewritten respectively as:
fn question_mark_equivalent(c: bool, x: Option<i32>) -> Option<i32> {
if c {
let a = x? + 10;
let b = 20;
Some(a + b)
} else {
let a = 0;
let b = 20;
Some(a + b)
};
}
fn test_equivalent(c1: bool, c2: bool) -> i32 {
if c1 {
if c2 {
return 1;
} else {
2
}
} else {
2
}
}
The crucial rule to rewrite like this would be recognizing patterns like:
let lhs = if c then a (else b);
e
and rewrite it to:
if c {
let lhs = a;
e
} else {
let lhs = b;
e
}
In case there is no else originally we would insert it.
And this would be done recursively, only if there is a return statement somewhere in one of the branches of the if.
To avoid recursing twice, we would need to pass the following expression top-down in the recursive calls (in an argument) and propagate the information of whether there is a return statement bottom-up (in the return value of the recursive function). We should also take care to avoid inserting an expression after a return statement (otherwise it would not be simplified by drop_needless_returns.
We also need another phase (or a second visitor inside drop_needless_returns) to rewrite
let a = return b;
c
as
return b
As they can be nested it needs its own visitor.
I thought #907 is only partially implementing this?
I thought #907 is only partially implementing this?
We have a lot of issues around this. My understanding is that #907 implements this one. But it is only a part of this meta-issue: #15 . And the main part that is now missing is #196
ok, @W95Psp please clean up and close what's done 🙏🏻
I am designing the part about loops. First attempt:
- Apply the current
RewriteControlFlowphase inside loops bodies but treatbreakandcontinuelikereturn. After that, all of them should be in "return position" in the loop body. - Add an enum in
Rust_primitives.Haxwith 3 variants (break,continueandreturn) each carrying a value (the enum should be polymorphic to allow carrying values of any type) - Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of
breakandcontinuethe value they should contain is the accumulator (loops are translated as afold). A value in return position without any of these statements should be wrapped in acontinue - Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a
continue, and just pass along abreakorreturn - Wrap the whole
foldinside a pattern matching to extract the result. in case we find the result wrapped in abreakorcontinuewe should just unwrap it. In case we find areturnwe should return, that means we should also treat this new pattern matching with the currentRewrtieControlFlowphase to make sure thisreturnis treated like the others outside loops (of course we can analyze first to know if the body of the loop contains areturnor not).
For all this to work well, I suspect that we should treat the body of the loop after phase LocalMutation. It might be better also for the rest of RewriteControlFlow. This means we should also move DropNeedlessReturns but there might be a reason they come in the other order.
I am designing the part about loops. First attempt:
* Apply the current `RewriteControlFlow` phase inside loops bodies but treat `break` and `continue` like `return`. After that, all of them should be in "return position" in the loop body. * Add an enum in `Rust_primitives.Hax` with 3 variants (`break`, `continue` and `return`) each carrying a value (the enum should be polymorphic to allow carrying values of any type) * Modify the body of loops to replace each of these 3 statements by its corresponding variant from the enum. In the case of `break` and `continue` the value they should contain is the accumulator (loops are translated as a `fold`). A value in return position without any of these statements should be wrapped in a `continue` * Replace the whole body of the loop by a pattern matching over the accumulator that executes one more iteration only if the previous gave a `continue`, and just pass along a `break` or `return` * Wrap the whole `fold` inside a pattern matching to extract the result. in case we find the result wrapped in a `break` or `continue` we should just unwrap it. In case we find a `return` we should return, that means we should also treat this new pattern matching with the current `RewrtieControlFlow` phase to make sure this `return` is treated like the others outside loops (of course we can analyze first to know if the body of the loop contains a `return` or not).For all this to work well, I suspect that we should treat the body of the loop after phase
LocalMutation. It might be better also for the rest ofRewriteControlFlow. This means we should also moveDropNeedlessReturnsbut there might be a reason they come in the other order.
A few additions:
- I think we should rewrite the control flow in
RewriteControlFlow, and also remove everything that may come after abreakorcontinue(like we already do forreturn). ThenLocalMutationwill possibly introduce expressions after them that we can reuse to know the value we should pass. - We might want, instead of inserting a pattern matching in the loop body, to define a specific fold that would do the pattern matching. We will still need the pattern matching on the result of the loop (in case there is a return, otherwise we could also have a specific fold that would extract the result out of its
breakorcontinuewrapper). - In case there are
continuebut nobreakorreturnwe don't need any wrapping - In the case of nested loops,
breakandcontinuemay contain a label that allows to choose which loop they refer to. I think we should not support that (at least for now).
Here is an example of the rewriting:
fn f (v: &[i32]) -> i32 {
let mut sum = 0;
for i in v {
if *i < 0 {
return 0
}
sum += *i;
}
sum *= 2;
sum
}
After the transformation (here with a classic fold but we will define a specific fold that will do this internal pattern matching on the accumulator, and we will also use std::ops::ControlFlow instead of this CF enum):
fn double_sum_eq (v: &[i32]) -> i32 {
let sum = 0;
match (
v.fold(|sum, i| {
match sum {
CF::Continue(sum) => {
if *i < 0 {
CF::Return(0)
} else {
let sum = sum + *i;
CF::Continue(sum)
}
},
_ => sum
}
},
sum
)
) {
CF::Return(v) => v,
CF::Break(v) | CF::Continue(v) =>
{
let sum = v;
let sum = sum * 2;
sum
}
}
}
@maximebuyse @W95Psp what's the state here?
We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).
@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.
Thus there is two things to do:
- [ ] clean up / make sure the PR is up-to-date and ready in terms of engine phases (@maximebuyse I think you were not certain about the order of phases?);
- [ ] write F*.
We wanted to work on that together in CC, but that will not happen this week (I thought I could do CC today, but after being awake for a bit, I think it's wiser to work slowly from home and recover).
@maximebuyse correct me if I'm wrong, but I think the PR (#988) is mostly ready, only the F* parts are missing.
Thus there is two things to do:
* [ ] clean up / make sure the PR is up-to-date and ready in terms of engine phases ([@maximebuyse](https://github.com/maximebuyse) I think you were not certain about the order of phases?);[ ] write F*.
I am quite happy with the order of phases now. But for sure there will be some cleanup to do, and we might change a bit the code we emit while writing the f*.
This introduced a regression https://github.com/inria-prosecco/circus-green/actions/runs/11510030233
#1056