marl
marl copied to clipboard
ASAN errors on MSVC Win32 builds
Using the new MSVC support for ASAN, running marl-unittests will report different address sanitizer errors on each run, but always from SchedulerParams/WithBoundScheduler.BlockingCallVoidReturn/*. It's possible skipping this suite would lead to ASAN failures in tests that run afterwards, but I haven't tried.
Repro steps
-
Run the Visual Studio installer and install ASAN support
-
Generate Win32 sln with tests enabled:
mkdir build32 && build32
cmake -A Win32 -DMARL_BUILD_TESTS=1 ..
-
Open Marl.sln with VS2019, select both
marlandmarl-unittestsin Solution Explorer, right-click Properties, set Configurations to All Configurations, C/C++ --> General --> Enable Address Sanitizer (Experimental) --> Yes -
Set configuration to RelWithDebInfo, build and run marl-unittests.
Result
I've gotten different outputs with multiple runs. Here's one:
[ RUN ] SchedulerParams/WithBoundScheduler.BlockingCallVoidReturn/3
=================================================================
==20236==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x057ff7b8 at pc 0x0061bbdc bp 0x057ff6fc sp 0x057ff6f0
READ of size 4 at 0x057ff7b8 thread T16777215
==20236==WARNING: Failed to use and restart external symbolizer!
#0 0x61bbdb in std::_Construct_in_place<marl::WaitGroup::Data,marl::Allocator * &> C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xmemory:229
#1 0x622f58 in marl::WaitGroup::WaitGroup C:\src\marl\include\marl\waitgroup.h:82
#2 0x626643 in <lambda_805be53bdef90e2d0ff16255d07f9da5>::operator() C:\src\marl\src\blockingcall_test.cpp:31
#3 0x628c0d in std::_Func_impl_no_alloc<<lambda_805be53bdef90e2d0ff16255d07f9da5>,void>::_Do_call C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\functional:903
#4 0x76fa1a in std::_Func_class<void>::operator() C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\functional:951
#5 0x77eb72 in marl::Scheduler::Worker::runUntilIdle C:\src\marl\src\scheduler.cpp:691
#6 0x77ee16 in marl::Scheduler::Worker::runUntilShutdown C:\src\marl\src\scheduler.cpp:576
#7 0x77e5a1 in marl::Scheduler::Worker::run C:\src\marl\src\scheduler.cpp:569
#8 0x770cdc in std::_Func_impl_no_alloc<<lambda_f08fac5c22a42aa758c925a9c8a41778>,void>::_Do_call C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\functional:903
#9 0x6a45f6 in marl::OSFiber::run C:\src\marl\src\osfiber_windows.h:97
#10 0x770587f0 in CreateProcessW+0x90 (C:\WINDOWS\System32\KERNELBASE.dll+0x101087f0)
#11 0x770587a5 in CreateProcessW+0x45 (C:\WINDOWS\System32\KERNELBASE.dll+0x101087a5)
#12 0x77b31b96 in RtlUserFiberStart+0x16 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x4b2f1b96)
Address 0x057ff7b8 is located in stack of thread T153 at offset 104 in frame
#0 0x603800 in ILT+10235(??1?$shared_ptrVmutexstdstdQAEXZ)+0x0 (C:\src\marl\build32\RelWithDebInfo\marl-unittests.exe+0x403800)
This frame has 1 object(s):
[16, 20) '_Value' <== Memory access at offset 104 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp, SEH and C++ exceptions *are* supported)
Thread T153 created by unknown thread
SUMMARY: AddressSanitizer: stack-buffer-overflow C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xmemory:229 in std::_Construct_in_place<marl::WaitGroup::Data,marl::Allocator * &>
Shadow bytes around the buggy address:
0x30affea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30affeb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30affec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30affed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30affee0: 00 00 00 00 00 00 00 00 00 00 f1 f1 04 f3 f3 f3
=>0x30affef0: f3 00 00 00 00 00 00[f2]f2 f2 f2 f8 f1 f1 00 04
0x30afff00: f2 f2 f2 f2 04 f2 04 f2 00 f2 04 f2 00 f3 f3 f3
0x30afff10: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30afff20: 00 00 00 00 f1 f1 04 f3 f3 f3 f3 f1 f1 00 00 00
0x30afff30: 00 00 00 f2 f2 f2 f2 f8 f3 f3 f3 f3 f1 f1 00 00
0x30afff40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
I'm able to reproduce, and ASAN seems to fire with the most basic of fiber tests.
For example: SchedulerParams/WithBoundScheduler.WaitGroup_OneTask/0 on repeat fails after the 4th or 5th iteration. Note that the /0 indicates there are no dedicated worker threads being spawned, so execution is purely deterministic. However, when ASAN fails (which test iteration) seems random.
Reducing this down even further, this also fails:
TEST_P(WithBoundScheduler, WaitGroup_Basic) {
marl::WaitGroup wg(1);
marl::schedule([wg] { wg.done(); });
wg.wait();
}
Attempting to replicate this with a minimal version using just Win32 calls:
TEST(BasicFibers) {
struct S {
static void __stdcall run(void* arg) {
auto& func = *reinterpret_cast<std::function<void()>*>(arg);
func();
}
};
auto mainFiber = ConvertThreadToFiberEx(nullptr, FIBER_FLAG_FLOAT_SWITCH);
ASSERT_NE(mainFiber, nullptr);
int i = 0;
std::function<void()> inc_i = [&] {
i++;
SwitchToFiber(mainFiber);
};
auto incIFiber = CreateFiberEx(0xffff, 0x10000, FIBER_FLAG_FLOAT_SWITCH, &S::run, &inc_i);
ASSERT_NE(incIFiber, nullptr);
ASSERT_EQ(i, 0);
SwitchToFiber(incIFiber);
ASSERT_EQ(i, 1);
DeleteFiber(incIFiber);
ConvertFiberToThread();
}
Seems to reliably pass.
Continuing to investigate...
On Linux ASAN requires use of __sanitizer_start_switch_fiber and __sanitizer_finish_switch_fiber to tell it about switching stacks. Not doing so causes false positives when ASAN shadow stack and real stack no longer match.
Hi Turo,
On Linux ASAN requires use of __sanitizer_start_switch_fiber and __sanitizer_finish_switch_fiber to tell it about switching stacks. Not doing so causes false positives when ASAN shadow stack and real stack no longer match.
This is specific to Windows, where I'd expect SwitchToFiber() to handle this automatically - perhaps this is a bad assumption?
Marl doesn't use __sanitizer_start_switch_fiber / __sanitizer_finish_switch_fiber and still passes all ASAN and TSAN tests on Linux and macOS. ~I experimented with adding these calls, and unfortunately it introduces unwanted failures for cross-fiber state access within the scheduler. This logic is known to be correct, and I could not find a clean way to silence them.~
Update: I was thinking of TSAN (__tsan_switch_to_fiber, etc).
This certainly smells like a fiber issue. I've decorated the Win32 calls in src\osfiber_windows.h with logging so we can see exactly what's going on - including printing the top and bottom of the stacks after each fiber switch via GetCurrentThreadStackLimits():
Repeating all tests (iteration 3) . . .
Note: Google Test filter = *WaitGroup_Basic/0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SchedulerParams/WithBoundScheduler
[ RUN ] SchedulerParams/WithBoundScheduler.WaitGroup_Basic/0
SetUp()
00712A88 = ConvertThreadToFiberEx()
STACK: [0x400000 - 0x500000]
00712D88 = CreateFiberEx(0xffff, 0x10000, FIBER_FLAG_FLOAT_SWITCH)
SwitchToFiber(00712D88)
STACK: [0x570000 - 0x580000]
=================================================================
==30752==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x0057fa74 at pc 0x00a5ee2c bp 0x0057f99c sp 0x0057f990
READ of size 4 at 0x0057fa74 thread T0
==30752==WARNING: Failed to use and restart external symbolizer!
#0 0xa5ee2b in std::_Construct_in_place<std::_List_node<marl::Scheduler::Fiber *,void *> *,std::_List_node<marl::Scheduler::Fiber *,void *> * const &> C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xmemory:229
#1 0xa78d68 in std::_Hash<std::_Uset_traits<marl::Scheduler::Fiber *,std::_Uhash_compare<marl::Scheduler::Fiber *,std::hash<marl::Scheduler::Fiber *>,std::equal_to<marl::Scheduler::Fiber *> >,marl::StlAllocator<marl::Scheduler::Fiber *>,0> >::_Insert_new_node_before C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xhash:1704
#2 0xa648e5 in std::_Hash<std::_Uset_traits<marl::Scheduler::Fiber *,std::_Uhash_compare<marl::Scheduler::Fiber *,std::hash<marl::Scheduler::Fiber *>,std::equal_to<marl::Scheduler::Fiber *> >,marl::StlAllocator<marl::Scheduler::Fiber *>,0> >::emplace<marl::Scheduler::Fiber * &> C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xhash:622
#3 0xa8758f in marl::Scheduler::Worker::runUntilIdle C:\src\marl\src\scheduler.cpp:677
#4 0xa8793c in marl::Scheduler::Worker::runUntilShutdown C:\src\marl\src\scheduler.cpp:576
#5 0xa8711d in marl::Scheduler::Worker::run C:\src\marl\src\scheduler.cpp:569
#6 0xa75365 in std::_Func_impl_no_alloc<<lambda_271fe4e226a9905d17d63bbb825e24f8>,void>::_Do_call C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\functional:903
#7 0x958056 in marl::OSFiber::run C:\src\marl\src\osfiber_windows.h:111
#8 0x757087f0 in CreateProcessW+0x90 (C:\windows\System32\KERNELBASE.dll+0x101087f0)
#9 0x757087a5 in CreateProcessW+0x45 (C:\windows\System32\KERNELBASE.dll+0x101087a5)
#10 0x77951b96 in RtlUserFiberStart+0x16 (C:\windows\SYSTEM32\ntdll.dll+0x4b2f1b96)
Address 0x0057fa74 is a wild pointer.
SUMMARY: AddressSanitizer: stack-buffer-overflow C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include\xmemory:229 in std::_Construct_in_place<std::_List_node<marl::Scheduler::Fiber *,void *> *,std::_List_node<marl::Scheduler::Fiber *,void *> * const &>
Shadow bytes around the buggy address:
0x300afef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x300aff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x300aff10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x300aff20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x300aff30: 00 00 00 00 00 f8 f3 f3 f3 f3 00 00 00 00 00 00
=>0x300aff40: f1 f1 04 f3 f3 f3 f3 00 00 00 00 00 f2 f2[f2]f2
0x300aff50: 04 f2 00 f2 f1 f1 f8 f2 00 f2 f8 f2 f8 f2 04 f2
0x300aff60: 01 f3 f3 f3 f3 00 00 00 00 00 f2 f2 04 f2 00 f2
0x300aff70: f8 f3 f3 f3 f3 00 00 00 f1 f1 00 00 00 00 00 00
0x300aff80: f2 f2 f2 f2 04 f2 00 f2 00 f3 f3 f3 f3 00 00 00
0x300aff90: 00 f1 f1 00 00 00 00 00 00 00 00 00 00 f2 04 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==30752==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x00500000; bottom 0x0057d000; size: 0xfff83000 (-512000)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
Note that the stack is claimed to be [0x570000 - 0x580000], the stack-buffer-overflow is at 0x0057fa74, and the WARNING says: __asan_handle_no_return: stack top: 0x00500000; bottom 0x0057d000; size: 0xfff83000 (-512000).
The ASAN failure also seems to always occur in the stl guts of marl::containers::unordered_set<Fiber*> - a type alias for std::unordered_set<Fiber*, std::hash<Fiber*>, std::equal_to<Fiber*>, marl::StlAllocator<Fiber*>>, although I can't see any obvious reason why.
About the annotations, found this thread where the user is implementing coroutines using fibers: https://github.com/google/sanitizers/issues/189#issuecomment-312903956
May be useful.
About the annotations, found this thread where the user is implementing coroutines using fibers: google/sanitizers#189 (comment)
Yes, I'm familiar with the annotations. I really wouldn't expect to have to use them for Win32 fiber API calls though.
The ASAN failure also seems to always occur in the stl guts of marl::containers::unordered_set<Fiber*> - a type alias for std::unordered_set<Fiber*, std::hash<Fiber*>, std::equal_to<Fiber*>, marl::StlAllocator<Fiber*>>, although I can't see any obvious reason why.
I've locally removed all use of marl::containers for std::, and I'm still seeing the same failures occurring in emplace(). Doesn't look like an issue with StlAllocator.
I have confirmed that this is not reproducible on Linux, using gcc 9.3.0, building with:
cmake .. -GNinja -DMARL_ASAN=1 -DCMAKE_CXX_FLAGS=-m32 -DCMAKE_C_FLAGS=-m32 -DCMAKE_ASM_FLAGS=-m32
I suspect this is a bug with MSVC's ASAN. Will report to Microsoft for further investigation.
Reported to Microsoft:
https://developercommunity.visualstudio.com/content/problem/1225394/possible-x86-asan-false-positives-when-using-win32.html
Microsoft closed the issue with:
We’re not able to prioritize this issue over the other higher-impact issues we receive every week, based on the votes and comments from others in the community and our understanding of the issue. We understand this may be disappointing; we’ve all been there, whether in this project or others we’ve contributed to. However, rest assured that we love your input. If you feel it deserves to stay open, then clarify your use case and contact us to let us know how severe it’s for you.
As I do not believe this is a bug in marl, closing this issue.