lean4 Do notation inhibits tail call optimization

The use of return inside do notation causes this tail-recursive loop to be compiled into two mutually recursive functions, which breaks TCO. The version using if-else instead (bar below) optimizes correctly.

set_option trace.compiler.ir.result true

partial def foo (n : Nat) : Nat := do
  if n == 10 then return 0
  foo (n+1)

-- def foo._lambda_1 (x_1 : @& obj) (x_2 : @& obj) : obj :=
--   let x_3 : obj := 1;
--   let x_4 : obj := Nat.add x_1 x_3;
--   let x_5 : obj := foo x_4;
--   dec x_4;
--   ret x_5
-- def foo (x_1 : @& obj) : obj :=
--   let x_2 : obj := 10;
--   let x_3 : u8 := Nat.decEq x_1 x_2;
--   case x_3 : u8 of
--   Bool.false →
--     let x_4 : obj := ctor_0[PUnit.unit];
--     let x_5 : obj := foo._lambda_1 x_1 x_4;
--     ret x_5
--   Bool.true →
--     let x_6 : obj := 0;
--     ret x_6

partial def bar (n : Nat) : Nat := do
  if n == 10 then 0 else
  bar (n+1)

-- def bar (x_1 : obj) : obj :=
--   let x_2 : obj := 10;
--   let x_3 : u8 := Nat.decEq x_1 x_2;
--   case x_3 : u8 of
--   Bool.false →
--     let x_4 : obj := 1;
--     let x_5 : obj := Nat.add x_1 x_4;
--     dec x_1;
--     let x_6 : obj := bar x_5;
--     ret x_6
--   Bool.true →
--     dec x_1;
--     let x_7 : obj := 0;
--     ret x_7

For reference, the do desugaring produces a term that looks roughly like this:

  let _do_jp : PUnit → Id Nat := fun (y : PUnit) => foo (n + 1);
  if n == 10 = true then pure 0 else
    do 
      let y ← pure PUnit.unit 
      _do_jp y

The foo._lambda_1 code corresponds to _do_jp here.

May 07 '21 00:05 digama0

To add on to this, if-else with return does compile the tail call correctly:

partial def foobar (n : Nat) : Nat := do
  if n == 10 then return 0 else return foobar (n+1)

/- 
def foobar (x_1 : obj) : obj :=
  let x_2 : obj := 10;
  let x_3 : u8 := Nat.decEq x_1 x_2;
  case x_3 : u8 of
  Bool.false →
    let x_4 : obj := 1;
    let x_5 : obj := Nat.add x_1 x_4;
    dec x_1;
    let x_6 : obj := foobar x_5;
    ret x_6
  Bool.true →
    dec x_1;
    let x_7 : obj := 0;
    ret x_7
-/

As such, I would say this has less to do with the do notation and return, and more to do with if and return. It looks to me like Lean is not optimizing an if statement ending with a return (i.e., a terminal if) into an if-else. That is, ideally, the following optimization would exist:

if bool then
  ...
  return x
...
return y

becomes

if bool then
  ...
  return x
else
  ...
  return y

May 09 '21 18:05 tydeu

Keep in mind that return in do-notation is an early exit, it is not pure. So it is reasonable to me that using return, especially in if conditions, will produce a join-point based desugaring. That said, it would probably be good to detect patterns like if cond then return x; y and if cond then return x else return y that can be modeled using just plain if-else and don't need the extra power of early exit, because the plain if-else version is generating better code even if we ignore the TCO issue.

But to really solve the TCO issue in general, including all the do sugar (the usual definition of "tail position" would not exclude these examples), what is needed is to support compilation of mutual tail calls, possibly using the mustcall attribute as indicated by @Kha .

May 09 '21 19:05 digama0

Keep in mind that return in do-notation is an early exit,

My apologies, but I am not sure what you are getting at here. I am aware that return is an early exit, but I fail to see how it applies to the points I was making.

May 09 '21 19:05 tydeu

Ah, I misinterpreted your point. Ignore that first sentence.

May 09 '21 19:05 digama0

But to really solve the TCO issue in general, including all the do sugar (the usual definition of "tail position" would not exclude these examples), what is needed is to support compilation of mutual tail calls, possibly using the mustcall attribute as indicated by @Kha.

This is also an interesting point. Even with the mutual recursive functions, aren't both functions preforming tail calls (i.e. foo calls the lambda in tail position and lambda calls foo in the tail position), so shouldn't they be optimized as such? It doesn't seem like the addition of the lambda should break that. So does the tail call optimization problems really have anything to do with syntax and the result IR? Doesn't it instead have to do with the later TCO?

May 09 '21 19:05 tydeu

OOPS! I just realized I made a major error in my previous foobar and pure_bar definitions. They are both using bar rather than recursing on themselves. With the proper definitions my previous observations about the boxed/unboxed versions disappears (as the correct versions also only produce boxed versions). I have also updated/deleted my previous posts to help clean the thread and remove the noise. Sorry about this!

May 09 '21 20:05 tydeu

But to really solve the TCO issue in general, including all the do sugar (the usual definition of "tail position" would not exclude these examples), what is needed is to support compilation of mutual tail calls, possibly using the mustcall attribute as indicated by @Kha.

This is also an interesting point. Even with the mutual recursive functions, aren't both functions preforming tail calls (i.e. foo calls the lambda in tail position and lambda calls foo in the tail position), so shouldn't they be optimized as such? It doesn't seem like the addition of the lambda should break that. So does the tail call optimization problems really have anything to do with syntax and the result IR? Doesn't it instead have to do with the later TCO?

Right, these are orthogonal issues. do notation produces suboptimal desugaring involving lambdas when they aren't necessary, and mutual TCO doesn't work. The original example would be fixed if either of these is addressed, but I think addressing the latter problem is more likely to solve more general examples where eliding the lambda isn't possible.

May 09 '21 20:05 digama0

Right, these are orthogonal issues. do notation produces suboptimal desugaring involving lambdas when they aren't necessary, and mutual TCO doesn't work.

I still find it interesting that you are laying the desugaring problem at the foot of the do notation in general. I would blame, more specifically, the imperative if . The if produces the joint point lambda that leads to this problem. Many other uses of do would not have this problem. I am also curious as to why the lambda isn't being inlined considering its small size (and, in this case, single usage) -- such inlining would also solve the TCO problem (for this example).

May 09 '21 20:05 tydeu

I would blame, more specifically, the imperative if .

Sure. The do notation desugaring is a big and complicated thing in lean 4, and there are several possible ways for join points to be introduced. At least unless would also need to be optimized similarly (that was what I used in the real world example from which this MWE was extracted), but looking at the do-IR directly would probably yield a more general optimization pattern.

I am also curious as to why the lambda isn't being inlined considering its small size (and, in this case, single usage) -- such inlining would also solve the TCO problem (for this example).

I imagine that (mutual) recursive calls might be inhibiting inlining in this case. It can be tricky to determine when it's actually a win. But I don't really know anything about how the inliner works.

May 09 '21 21:05 digama0

lean4 lean4 copied to clipboard

Do notation inhibits tail call optimization

lean4
lean4 copied to clipboard