Data race problem in mp_coro::sync_await ?
I tried one example in msvc2022 and had a wrong return value in debug build, while GCC in compiler explorer was fine.
Shouldn't this function
template<awaitable A>
[[nodiscard]] decltype(auto) sync_await(A&& awaitable)
be declared like this
template<awaitable A>
[[nodiscard]] auto sync_await(A&& awaitable)
So no reference can be returned?
Of course, the whole path the return value takes could be reviewed where a reference is ok and where it should be passed by value.
And shouldn't the return value be properly synchronized, as the used std::binary_semaphore not necessarily synchronizes memory where the
return value might reside between threads ?
The x86 an x86-64 does not require memory fencing, but Arm might, I think.
Concurrency is hard and the total lack of warnings, static checks and support form compilers is offending.
Hi @raymundhofmann, decltype(auto) is correct. Please note that the library also supports task<int&>.
I don't know what the synchronization problem you are referring to is. The synchronization primitive is used only to stop waiting when the value is ready. It does not guarantee any thread safety of the underlying storage (e.g., similarly to a std::shared_ptr).
Concurrency is hard and the total lack of warnings, static checks and support form compilers is offending.
Do you mean C++ in general or something specific with this simple library?
Hi @raymundhofmann,
decltype(auto)is correct. Please note that the library also supportstask<int&>.
The address sanitizer or MSVC2022 compiling "example/simple_async_tasks.cpp" giving "AddressSanitizer: heap-use-after-free" must be wrong fror the task<value> case then.
How is one supposed to make it work when returnig by value?
I don't know what the synchronization problem you are referring to is. The synchronization primitive is used only to stop waiting when the value is ready. It does not guarantee any thread safety of the underlying storage (e.g., similarly to a
std::shared_ptr).
The main issue is that it is returning a reference to a scope that has ended.
The synchronization problem is this:
https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence
https://www.sciencedirect.com/science/article/pii/S1383762124000390#:~:text=Even%20though%20both%20x86%20and,differently%20from%20the%20program%20order.
Basically it means that if one thread got the std::binary_semaphore signaling the value is ready the value it then reads might not be, unless explicit memory fence synchronization is used.
I am not sure though, as I didn't see in the std::binary_semaphore documentation if it is internally doing memory fencing.
If a atomic flag is used instead, the value then read might be synchronized by the atomic flag access implicit memory fence operations, but for the semaphore used here I think it could not be.
As said, on x86/x86-64 memory is synced between the cores, but on Arm not, because it is a costly thing to do energy and silicon area wise.
https://developer.arm.com/documentation/100941/0101/Barriers
Too bad that compiler explorer can generate ARM code but can't run it because it is on x86 servers.
Concurrency is hard and the total lack of warnings, static checks and support form compilers is offending.
Do you mean C++ in general or something specific with this simple library?
I mean that if the compiler clearly sees that a reference to a ending scope is returned it doesn't give an error. I mean that violations of temporary life time extension of references aren't detected by compilers.
There is so much that could be improved in the tools there.
The main issue is that it is returning a reference to a scope that has ended.
The library should return a reference only in case of task<T&>. Otherwise, it should return by value. If this is not the case, it is a bug.
Basically it means that if one thread got the std::binary_semaphore signaling the value is ready the value it then reads might not be, unless explicit memory fence synchronization is used.
I do not think that it is needed. std::counted_semaphore is meant for synchronization between threads, so it should work fine here.