binaryen feat: implement tail call optimization

Fixed: #7647 Now it is just a generic concept about tail call optimization. I am not sure whether I am in the correct way. So I put the draft version and expect to get some feedback.

The basic idea is we can convert

(func ...
   (call $f)
)

to

(func ...
   (return_call $f)
)

(func ...
   (return (call $f))
)

to

(func ...
   (return_call $f)
)

(func ...
   (if
      (condition)
      (call $f)
   )
)

to

(func ...
   (if
      (condition)
      (return_call $f)
   )
)

Jun 06 '25 08:06 HerrCai0907

I believe this is valid, though I also vaguely remember there was a reason it might not always be helpful... @fgmccabe @tlively do you know what the best practices are for automatic conversion of normal calls to tail calls (when the return type permits)?

Jun 10 '25 19:06 kripken

The only downside I can think of is that this might lead to confusing stack traces. It would also be good to double check with a microbenchmark that this is actually good for performance; I wouldn't be too surprised if tail calls are slow because the engine needs to do some kind of extra shuffling to make them work.

Jun 10 '25 20:06 tlively

Not sure if this is strictly on point: but automatically converting last calls to tail calls may (will) affect semantics of applications. return_call is useful for so-called tail optimization. But, it's primary purpose is in supporting languages (particularly functional[?] languages like Scheme) that depend on this for their semantics. Since this is not part of C/C++'s semantics, using return_call for emscriptem represents a risk.

Jun 10 '25 20:06 fgmccabe

Can you elaborate on what the observable semantic differences are? Stack exhaustion doesn't count, since the number of stack frames is an implementation limit and therefore allowed to be broadly nondeterministic, even within a single execution.

Jun 10 '25 21:06 tlively

Actually, stack exhaustion, as you put it, DOES count (in languages like Scheme). Other considerations include meta-level features such as dynamic scope, profiling, etc.

Jun 10 '25 23:06 fgmccabe

I agree there are no semantic related issue, combine call and return should not have semantic changing except call stack.

It would also be good to double check with a microbenchmark that this is actually good for performance

According to https://v8.dev/blog/wasm-tail-call#proposal, It should be have some improvement, but I am not sure whether v8 itself do this optimization for some simple cases. I will do this benchmark later.

Jun 11 '25 01:06 HerrCai0907

Aha, dynamic scope is the issue. @HerrCai0907, you'll have to make sure that the optimized calls are not inside try or try_table.

Jun 11 '25 15:06 tlively

you'll have to make sure that the optimized calls are not inside try or try_table.

I think the current implement already consider it.

Jun 13 '25 08:06 HerrCai0907

It would also be good to double check with a microbenchmark that this is actually good for performance

I have run test for quick sort array with 9000 elements in MAC M1 Pro. It can speed up about 0.57% execution time.

Jun 13 '25 08:06 HerrCai0907

I took a look at the draft implementation, and it doesn't look like it takes try or try_table into account. (return (call ...)) inside a try or try_table will be optimized incorrectly. You could also use Properties::getFallthrough to see if the value reaching a return or the end of a function comes from a call, even if there are other things like blocks or local.tee instructions in the way.

Jun 15 '25 01:06 tlively

You could also use Properties::getFallthrough to see if the value reaching a return or the end of a function comes from a call, even if there are other things like blocks or local.tee instructions in the way.

I think it is better replacement for checkTailCall in the current PR, right?

Jun 16 '25 04:06 HerrCai0907

I think it is better replacement for checkTailCall in the current PR, right?

Yes, or at least it could be used inside checkTailCall to simplify it and make it more powerful. You will want to keep the handling of if-else in checkTailCall because Properties::getFallthrough does not handle that.

Jun 16 '25 12:06 tlively

Properties::getFallthrough will handle cases based on "MUST", but this check need "MAY".

Jul 14 '25 16:07 HerrCai0907