llvm-project
llvm-project copied to clipboard
C++20 coroutine miscompiled if it contains a dispatch table + computed goto
Repro: https://godbolt.org/z/j3o96sYd1 clang trunk accepts the program, but miscompiles it. gcc trunk compiles the program correctly.
The program implements a simple interpreter using the a dispatch table as described, e.g., in https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables.
The problem only occurs when combining coroutines with a dispatch table and a computed goto. A switch-based interpreter works as expected.
Looking at the output of -O0 -emit-llvm
, I think the problem is that:
- The pointers stored inside the
_ZZ9interpretP11InstructionRiE14dispatch_table
, i.e. the static dispatch table, refer to labels from the original, pre-split coroutine - The post-split
resume
clone of the coroutine tries to dispatch from this table, though. Theresume
function would do an indirect jump back into the pre-split coroutine - which makes no sense - The SimplifyCFG correctly identifies those jumps as non-sensical and therefore replaces the complete
resume
function byunreachable
.
I think the underlying root cause here is that the labels stored in the static void* dispatch_table
are duplicated by coroutine splitting. I.e., after the coro-split pass, there are in fact two inc
labels: one in the ramp function, and one in the resume function. It is not clear which of the both labels the dispatchTable
should refer to.
The problem becomes even more apparent in https://godbolt.org/z/7aYaG9xfE, where the inital_suspend
was replaced by a suspend_never
. In this case, we would need two separate dispatch tables: one storing the label pointers from the ramp function and a second one containing the label pointers for the resume function.
Judging by the assembly generated by gcc (https://godbolt.org/z/PMsqY5e5Y), it seems that they are able to avoid this problem, by splitting the coroutine in a different way: While LLVM duplicates the complete interpreter loop and its labels in both the ramp and the resume function. gcc instead only emits the interpreter loop in the "actor" function (== LLVM's resume function), and calls the actor function from the ramp function.
@ChuanqiXu9 FYI
Thanks for reporting. I'll take a look.
@llvm/issue-subscribers-c-20
@llvm/issue-subscribers-coroutines
Your analysis is basically correct. And this one is not easy to fix. The biggest difference of coroutines between clang and gcc is that gcc handles coroutines in the frontend but clang handles it in the middle end. This brings many differences.
For example, in the frontend, we could scope informations. But in the middle end, we will store them in a global variable and there is no information about scopes. And it doesn't make a lot of sense to copy a global variable when splitting coroutines.
Also the computed goto are not standard C++. It is a GCC extension although clang supports it too. So I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.
And from the link you pasted, computed goto
looks like primarily for performance but it doesn't relate to correctness. So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G
I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.
I agree. A warning should be fine for the time being. In case there is larger demand for combining computed goto with coroutines, we can still add support for that later.
It took a pretty long time to understand why the interpreter didn't work as expected. (I was trying this on a much bigger, real-world production interpreter for SQL queries originally...) A clear warning/error would have saved me a lot of time.
I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.
computed goto looks like primarily for performance but it doesn't relate to correctness
right
So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G
yes, that works, but it does not give me the performance I needed.
In the actual production interpreter, I solved the problem as shown in https://godbolt.org/z/17G8jGbse:
I did not use C++ coroutines in the interpreter loop itself, but instead kept track of the current suspension point. Doing so is a trivial change because the interpreter has to keep track of the next instruction anyway. When encountering a suspension point, I return from the interpreter-function and signal that I want to be called by returning false
instead of true
.
For https://godbolt.org/z/17G8jGbse, it is vital that LLVM does not inline the doInterpret
function into the interpret
function. Afaict, this is currently not forbidden by LLVM, and I was just lucky that code was not inlined in this case. We should probably change LLVM to actually enforce that it never inlines computed-gotos into a coroutine function
I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.
Currently, the LLVM will not inline these kind of function by default: https://godbolt.org/z/v4Kvqrfzj. So we could only emit an error for it simply. I'll try to make it.
yes, that works, but it does not give me the performance I needed.
Just out of interesting, how much is the performance gap?
Closing, as this was fixed in cc526e3 which will ship with clang-16