compile-time-regular-expressions icon indicating copy to clipboard operation
compile-time-regular-expressions copied to clipboard

Potential optimization [[clang::musttail]] or __attribute__((musttail))

Open Andersama opened this issue 4 years ago • 5 comments
trafficstars

Apparently a new feature in clang which guarantees the tail call optimization, may help optimize debug builds. Not sure if this produces different results than __forceinline, may be something to test.

Andersama avatar May 26 '21 00:05 Andersama

I was looking at it recently, it's only there to guarantee the call will be tail recursion, and if not it will lead to a compilation failure.

You are welcome to write a patch, but you will need to target it with a CTRE_something_something macro and only for clang of the specific version.

hanickadot avatar May 26 '21 08:05 hanickadot

Ok in playing around with adding it in, like you said there are compilation errors, lot of these:

error: cannot perform a tail call to function 'evaluate' because its signature is incompatible with the calling function

Might not work for this library. It sounds like the calling function and "tail" function need to have a matching signature, I guess it makes sense because otherwise with something as recursive as this there's a lot of shuffling registers around.

Andersama avatar May 27 '21 10:05 Andersama

That is strange, because Clang can tail optimize non-matching calls. https://godbolt.org/z/cKrooEM7f

Maybe it's coming to [[musttail]] in Clang 14 - the feature is new to Clang 13, maybe it's not fully fleshed out yet.

Or maybe it's intentional, because the alternative would yield platform specific code, usually unintentionally. For example, on x86_64 Unix, void a() can tail call void b(int,int,int,int,int), but not on Windows (Unix passes six parameters in registers, but only four on Windows - and on i386, the register argument count is zero).

Alcaro avatar May 27 '21 11:05 Alcaro

Pretty sure it should still be ok, because at least in this case (although maybe it'd have to be hinted to the compiler, the only difference is the last parameter, which is just a tag type and technically isn't used). If you're more into clang development maybe fly that by the developers. It seems to me like something like the [[maybe_unused]] parameter eg [[unused]] should exist just for this reason. Hint to the compiler that the parameter's maybe just there to distinguish the function call, maybe not necessarily that the signature's different than another.

It seems like with optimizations clang clearly understands how to handle __forceinline, I would think if you can inline a function the tail call optimization is sort of in between.

Andersama avatar May 28 '21 00:05 Andersama

Reported to Clang devs: https://github.com/llvm/llvm-project/issues/54964

davidbolvansky avatar Apr 18 '22 17:04 davidbolvansky