llvm-project C++20 coroutine miscompiled if it contains a dispatch table + computed goto

Repro: https://godbolt.org/z/j3o96sYd1 clang trunk accepts the program, but miscompiles it. gcc trunk compiles the program correctly.

The program implements a simple interpreter using the a dispatch table as described, e.g., in https://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables.

The problem only occurs when combining coroutines with a dispatch table and a computed goto. A switch-based interpreter works as expected.

Jul 07 '22 23:07 vogelsgesang

Looking at the output of -O0 -emit-llvm, I think the problem is that:

The pointers stored inside the _ZZ9interpretP11InstructionRiE14dispatch_table, i.e. the static dispatch table, refer to labels from the original, pre-split coroutine
The post-split resume clone of the coroutine tries to dispatch from this table, though. The resume function would do an indirect jump back into the pre-split coroutine - which makes no sense
The SimplifyCFG correctly identifies those jumps as non-sensical and therefore replaces the complete resume function by unreachable.

I think the underlying root cause here is that the labels stored in the static void* dispatch_table are duplicated by coroutine splitting. I.e., after the coro-split pass, there are in fact two inc labels: one in the ramp function, and one in the resume function. It is not clear which of the both labels the dispatchTable should refer to.

The problem becomes even more apparent in https://godbolt.org/z/7aYaG9xfE, where the inital_suspend was replaced by a suspend_never. In this case, we would need two separate dispatch tables: one storing the label pointers from the ramp function and a second one containing the label pointers for the resume function.

Judging by the assembly generated by gcc (https://godbolt.org/z/PMsqY5e5Y), it seems that they are able to avoid this problem, by splitting the coroutine in a different way: While LLVM duplicates the complete interpreter loop and its labels in both the ramp and the resume function. gcc instead only emits the interpreter loop in the "actor" function (== LLVM's resume function), and calls the actor function from the ramp function.

@ChuanqiXu9 FYI

Jul 08 '22 00:07 vogelsgesang

Thanks for reporting. I'll take a look.

Jul 08 '22 03:07 ChuanqiXu9

@llvm/issue-subscribers-c-20

Jul 08 '22 06:07 llvmbot

@llvm/issue-subscribers-coroutines

Aug 12 '22 05:08 llvmbot

Your analysis is basically correct. And this one is not easy to fix. The biggest difference of coroutines between clang and gcc is that gcc handles coroutines in the frontend but clang handles it in the middle end. This brings many differences.

For example, in the frontend, we could scope informations. But in the middle end, we will store them in a global variable and there is no information about scopes. And it doesn't make a lot of sense to copy a global variable when splitting coroutines.

Also the computed goto are not standard C++. It is a GCC extension although clang supports it too. So I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.

Aug 12 '22 09:08 ChuanqiXu9

And from the link you pasted, computed goto looks like primarily for performance but it doesn't relate to correctness. So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G

Aug 12 '22 09:08 ChuanqiXu9

I prefer to add a warning/error instead of doing some hacks for the non-stantandard use cases.

I agree. A warning should be fine for the time being. In case there is larger demand for combining computed goto with coroutines, we can still add support for that later.

It took a pretty long time to understand why the interpreter didn't work as expected. (I was trying this on a much bigger, real-world production interpreter for SQL queries originally...) A clear warning/error would have saved me a lot of time.

I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.

computed goto looks like primarily for performance but it doesn't relate to correctness

right

So I think you want an interpretable interpreter? If yes, I think we could make it by: https://godbolt.org/z/5Y8b48r6G

yes, that works, but it does not give me the performance I needed. In the actual production interpreter, I solved the problem as shown in https://godbolt.org/z/17G8jGbse: I did not use C++ coroutines in the interpreter loop itself, but instead kept track of the current suspension point. Doing so is a trivial change because the interpreter has to keep track of the next instruction anyway. When encountering a suspension point, I return from the interpreter-function and signal that I want to be called by returning false instead of true.

For https://godbolt.org/z/17G8jGbse, it is vital that LLVM does not inline the doInterpret function into the interpret function. Afaict, this is currently not forbidden by LLVM, and I was just lucky that code was not inlined in this case. We should probably change LLVM to actually enforce that it never inlines computed-gotos into a coroutine function

Aug 13 '22 00:08 vogelsgesang

I think, we should furthermore disable inlining of any function which contains computed-gotos into a coroutine function.

Currently, the LLVM will not inline these kind of function by default: https://godbolt.org/z/v4Kvqrfzj. So we could only emit an error for it simply. I'll try to make it.

yes, that works, but it does not give me the performance I needed.

Just out of interesting, how much is the performance gap?

Aug 15 '22 09:08 ChuanqiXu9

Closing, as this was fixed in cc526e3 which will ship with clang-16

Feb 10 '23 10:02 vogelsgesang

llvm-project llvm-project copied to clipboard

C++20 coroutine miscompiled if it contains a dispatch table + computed goto

llvm-project
llvm-project copied to clipboard