feat: implement tail call optimization
Fixed: #7647 Now it is just a generic concept about tail call optimization. I am not sure whether I am in the correct way. So I put the draft version and expect to get some feedback.
The basic idea is we can convert
(func ...
(call $f)
)
to
(func ...
(return_call $f)
)
(func ...
(return (call $f))
)
to
(func ...
(return_call $f)
)
(func ...
(if
(condition)
(call $f)
)
)
to
(func ...
(if
(condition)
(return_call $f)
)
)
I believe this is valid, though I also vaguely remember there was a reason it might not always be helpful... @fgmccabe @tlively do you know what the best practices are for automatic conversion of normal calls to tail calls (when the return type permits)?
The only downside I can think of is that this might lead to confusing stack traces. It would also be good to double check with a microbenchmark that this is actually good for performance; I wouldn't be too surprised if tail calls are slow because the engine needs to do some kind of extra shuffling to make them work.
Not sure if this is strictly on point: but automatically converting last calls to tail calls may (will) affect semantics of applications. return_call is useful for so-called tail optimization. But, it's primary purpose is in supporting languages (particularly functional[?] languages like Scheme) that depend on this for their semantics. Since this is not part of C/C++'s semantics, using return_call for emscriptem represents a risk.
Can you elaborate on what the observable semantic differences are? Stack exhaustion doesn't count, since the number of stack frames is an implementation limit and therefore allowed to be broadly nondeterministic, even within a single execution.
Actually, stack exhaustion, as you put it, DOES count (in languages like Scheme). Other considerations include meta-level features such as dynamic scope, profiling, etc.
I agree there are no semantic related issue, combine call and return should not have semantic changing except call stack.
It would also be good to double check with a microbenchmark that this is actually good for performance
According to https://v8.dev/blog/wasm-tail-call#proposal, It should be have some improvement, but I am not sure whether v8 itself do this optimization for some simple cases. I will do this benchmark later.
Aha, dynamic scope is the issue. @HerrCai0907, you'll have to make sure that the optimized calls are not inside try or try_table.
you'll have to make sure that the optimized calls are not inside
tryortry_table.
I think the current implement already consider it.
It would also be good to double check with a microbenchmark that this is actually good for performance
I have run test for quick sort array with 9000 elements in MAC M1 Pro. It can speed up about 0.57% execution time.
I took a look at the draft implementation, and it doesn't look like it takes try or try_table into account. (return (call ...)) inside a try or try_table will be optimized incorrectly. You could also use Properties::getFallthrough to see if the value reaching a return or the end of a function comes from a call, even if there are other things like blocks or local.tee instructions in the way.
You could also use Properties::getFallthrough to see if the value reaching a return or the end of a function comes from a call, even if there are other things like blocks or local.tee instructions in the way.
I think it is better replacement for checkTailCall in the current PR, right?
I think it is better replacement for
checkTailCallin the current PR, right?
Yes, or at least it could be used inside checkTailCall to simplify it and make it more powerful. You will want to keep the handling of if-else in checkTailCall because Properties::getFallthrough does not handle that.
Properties::getFallthrough will handle cases based on "MUST", but this check need "MAY".