quickjs Tail-call dispatch

I'd like to suggest to implement tail-call dispatch in QuickJS. Quick demo of what it is.

This got recently done to CPython with great success, +5-10% performance. I've slapped together a wip patch for QuickJS and here are some preliminary results (Debian 13 arm64 VM on Mac M4 with clang 19.1.7, median of 10 runs):

Test	b2268561	`2dcc05b1`	%	p_welch
Richards	1799.5	1968	+9.36%	0.0000*
DeltaBlue	1872.5	1836	-1.95%	0.0000*
Crypto	2033	2472	+21.59%	0.0288*
RayTrace	3408.5	3652	+7.14%	0.0000*
EarleyBoyer	4052	4257.5	+5.07%	0.5588
RegExp	1012	1023	+1.09%	0.5063
Splay	5693.5	5813	+2.10%	0.8515
SplayLatency	19752	20219	+2.36%	0.0019*
NavierStokes	3730	5150.5	+38.08%	0.0000*
Geomean	3247	3534	+8.84%

The diff is somewhat large (914+ 659-), but the bulk of it are harmless formatting changes to make CASE blocks less interdependent and splittable into separate functions. Beside tail call dispatch, being able to split them up like that could be also useful for experimenting with adding some JIT.

Dec 04 '25 11:12 ivankra

Nice! Question: is the enlargement of the stack frame structure not a concern?

Dec 04 '25 11:12 saghul

I added 7 fields to JS_StackFrame - it was just the easiest way for me to pass them through to code inside CASE blocks while not having access to JS_CallInternal's variables, but likely there's room to optimize there. Note I also eliminated 4 of those variables, which should somewhat help.

Dec 04 '25 11:12 ivankra

Oh and there would be of course some more stack usage at any transitions out of tail-callers where they need to spill stuff to stack, but I haven't measured it. This would be a more fundamental cost of this approach, but the performance gains probably justify it, especially since it's easy to turn it off at compile time if needed

Dec 04 '25 11:12 ivankra

Interesting. On x86_64 I measured a (small) speedup of 3.5% after having removed one parameter to the function opcodes (otherwise there are not enough saved registers). The main benefit seems that the generated code is less dependent on regressions of performance due to the varying compiler optimizations which are difficult to predict on large functions.

Dec 22 '25 14:12 bellard