cppcoro icon indicating copy to clipboard operation
cppcoro copied to clipboard

Remove use of std::atomic in task and async_generator once symmetric transfer is available

Open lewissbaker opened this issue 6 years ago • 5 comments

The current implementation of task and async_generator make use of atomics to arbitrate between a potential race between the awaiting consumer suspending while waiting for a value to be produced and the completion of the next value.

An alternative approach which doesn't require use of atomic operations is to suspend the awaiting coroutine first, attach its coroutine_handle as the continuation of the task/async_generator and then resume the task/async_generator. When the task/async_generator produces a value, it calls .resume() on the coroutine handle. It can do this without needing any conditionals or atomic operations since the continuation is attached before the coroutine starts executing.

The problem with this alternative approach is that it can lead to stack-overflow if the coroutine produces its value synchronously and the awaiting coroutine awaits many tasks in a row that all complete synchronously. This is because the awaiting coroutine resumes the producer coroutine inside an await_suspend() method by calling coroutine_handle::resume(). If the producer coroutine produces a value synchronously then inside the call to coroutine_handle::resume() it will call await_suspend() at either a co_yield or final-suspend point and then resume the continuation by calling coroutine_handle::resume() on the awaiting coroutine.

For example, if you have a coroutine foo that continually awaits task<> bar() which completes synchronously then you can end up with a stack-trace that looks like:

  • foo [resume]
  • task::promise_type::final_awaiter::await_suspend()
  • bar [resume]
  • task::await_suspend()
  • foo [resume]
  • task::promise_type::final_awaiter::await_suspend()
  • bar [resume]
  • task::await_suspend()
  • foo [resume]
  • etc...

Under Clang optimised builds these calls can be performed as tail-calls which avoids the stack-overflow issue (they are all void-returning functions). However this isn't guaranteed - Clang debug and both MSVC optimised and debug builds are not able to perform tail-calls here.

To work around this, the current implementation first resumes the task's coroutine and waits until it suspends. Then it checks to see if it completes synchronously and if so then returns false from await_suspend() to continue execution of the awaiting coroutine without incurring an extra stack-frame. However, this means we then need to use atomics to decide the race between the producer running to completion on another thread and the consumer suspending on the current thread.

@GorNishanov has recently added an extension to Clang that allows returning coroutine_handle from await_suspend() that will be immediately resumed, but this time with guarantee that the compiler will perform a tail-call resumption of the returned coroutine. This is called "symmetric transfer" and allow suspending one coroutine and resuming another without consuming any additional stack-space.

Once this capability is fully implemented in clang and is also available in MSVC we can get rid of the use of atomics in task and async_generator and make use of symmetric transfer to avoid stack-overflow instead. This should improve performance of these classes.

lewissbaker avatar Nov 02 '17 06:11 lewissbaker

Note that there are some experimental implementations of the task, async_generator and recursive_generator that use the symmetric transfer facility in the https://github.com/lewissbaker/cppcoro/tree/tailcall branch

lewissbaker avatar Nov 02 '17 11:11 lewissbaker

PR #91 has merged support for symmetric-transfer on compilers that support it.

It's currently conditionally compiled in with #if CPPCORO_COMPILER_SUPPORTS_SYMMETRIC_TRANSFER which is currently only defined for Clang.

Once we have a version of MSVC that supports this we can switch to this implementation for MSVC too.

lewissbaker avatar Oct 18 '18 17:10 lewissbaker

It's unclear to me whether the latest MSVC versions support this by now or not. I have not tried yet with cppcoro but wrote my own promise and task classes. With MSVC 16.5, I am able to return std::coroutine_handle from await_suspend but I am still running into stack overflows when repeated co_awaits finish synchronously. I guess the feature may only be partially supported right now (tail calls are not guaranteed yet)?

chausner avatar Mar 23 '20 20:03 chausner

Yes. The front-end part was implemented, but, back end is not (hence the stack overflow).

It is scheduled to be done soon-ish


From: chausner [email protected] Sent: Monday, March 23, 2020 1:42 PM To: lewissbaker/cppcoro [email protected] Cc: Gor Nishanov [email protected]; Mention [email protected] Subject: Re: [lewissbaker/cppcoro] Remove use of std::atomic in task and async_generator once symmetric transfer is available (#55)

It's unclear to me whether the latest MSVC versions support this by now or not. I have not tried yet with cppcoro but wrote my own promise and task classes. With MSVC 16.5, I am able to return std::coroutine_handle from await_suspend but I am still running into stack overflows when repeated co_awaits finish synchronously. I guess the feature may only be partially supported right now (tail calls are not guaranteed yet)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flewissbaker%2Fcppcoro%2Fissues%2F55%23issuecomment-602845039&data=02%7C01%7Cgorn%40microsoft.com%7C2dcbcbda47344634091608d7cf6aad31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205929385488387&sdata=iCHQDFqnshQIkohtrGf6QOQH1KUMZvm5NPi%2Fa%2ByYUYw%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABLXD4NHTEYGTYCMVK72QM3RI7CSRANCNFSM4EB467YQ&data=02%7C01%7Cgorn%40microsoft.com%7C2dcbcbda47344634091608d7cf6aad31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637205929385498381&sdata=lctN1KSodbxYleNm6ZWD43OUQmBPbRMjQeGBXqzGmAc%3D&reserved=0.

GorNishanov avatar Mar 23 '20 22:03 GorNishanov

Yes. The front-end part was implemented, but, back end is not (hence the stack overflow). It is scheduled to be done soon-ish

Thanks for the clarification. Looking forward to seeing it fully supported.

chausner avatar Mar 23 '20 22:03 chausner