compile-time-regular-expressions
compile-time-regular-expressions copied to clipboard
Potential optimization [[clang::musttail]] or __attribute__((musttail))
Apparently a new feature in clang which guarantees the tail call optimization, may help optimize debug builds. Not sure if this produces different results than __forceinline, may be something to test.
I was looking at it recently, it's only there to guarantee the call will be tail recursion, and if not it will lead to a compilation failure.
You are welcome to write a patch, but you will need to target it with a CTRE_something_something macro and only for clang of the specific version.
Ok in playing around with adding it in, like you said there are compilation errors, lot of these:
error: cannot perform a tail call to function 'evaluate' because its signature is incompatible with the calling function
Might not work for this library. It sounds like the calling function and "tail" function need to have a matching signature, I guess it makes sense because otherwise with something as recursive as this there's a lot of shuffling registers around.
That is strange, because Clang can tail optimize non-matching calls. https://godbolt.org/z/cKrooEM7f
Maybe it's coming to [[musttail]] in Clang 14 - the feature is new to Clang 13, maybe it's not fully fleshed out yet.
Or maybe it's intentional, because the alternative would yield platform specific code, usually unintentionally. For example, on x86_64 Unix, void a() can tail call void b(int,int,int,int,int), but not on Windows (Unix passes six parameters in registers, but only four on Windows - and on i386, the register argument count is zero).
Pretty sure it should still be ok, because at least in this case (although maybe it'd have to be hinted to the compiler, the only difference is the last parameter, which is just a tag type and technically isn't used). If you're more into clang development maybe fly that by the developers. It seems to me like something like the [[maybe_unused]] parameter eg [[unused]] should exist just for this reason. Hint to the compiler that the parameter's maybe just there to distinguish the function call, maybe not necessarily that the signature's different than another.
It seems like with optimizations clang clearly understands how to handle __forceinline, I would think if you can inline a function the tail call optimization is sort of in between.
Reported to Clang devs: https://github.com/llvm/llvm-project/issues/54964