tracy
tracy copied to clipboard
`AddressSanitizer: global-buffer-overflow` on client when viewing "memory" pane on server
When running Tracy + ASan on our project, we get a buffer overflow in Tracy a few seconds after we open the "Memory" panel on the Tracy server. It runs fine if we don't open the "Memory" panel.
I'm not sure how to best narrow this down. We use both TracyAlloc
and TracyAllocN
in our code, and we're running Tracy 0.8.2 on Linux.
==856==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000003d9d4ad at pc 0x00000051b327 bp 0x7f1ce87a8bd0 sp 0x7f1ce87a8398
READ of size 32 at 0x000003d9d4ad thread T2 (Tracy Profiler)
#0 0x51b326 in __asan_memcpy (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x51b326)
#1 0x2f24a4b in tracy::Profiler::AppendDataUnsafe(void const*, unsigned long) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyProfiler.hpp:781:9
#2 0x2f24a4b in tracy::Profiler::SendLongString(unsigned long, char const*, unsigned long, tracy::QueueType) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3016:5
#3 0x2f2e346 in tracy::Profiler::HandleSymbolCodeQuery(unsigned long, unsigned int) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3807:9
#4 0x2f0b7ea in tracy::Profiler::HandleServerQuery() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3378:9
#5 0x2f014a3 in tracy::Profiler::Worker() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1867:30
#6 0x302a983 in tracy::Profiler::LaunchWorker(void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyProfiler.hpp:739:63
#7 0x303e405 in tracy::Thread::Launch(void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyThread.hpp:80:40
#8 0x7f1cf0a1f608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#9 0x7f1cf0918292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
0x000003d9d4ad is located 0 bytes to the right of global variable '<string literal>' defined in '/home/jorgenpt/proj-main/SpryFox/Common/SomeCode.cpp:282:10' (0x3d9d4a0) of size 13
'<string literal>' is ascii string 'SendCommands'
SUMMARY: AddressSanitizer: global-buffer-overflow (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x51b326) in __asan_memcpy
Shadow bytes around the buggy address:
0x0000807aba40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807aba50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807aba60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807aba70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807aba80: 00 f9 f9 f9 00 00 00 00 00 00 00 00 01 f9 f9 f9
=>0x0000807aba90: f9 f9 f9 f9 00[05]f9 f9 00 00 00 00 00 00 00 00
0x0000807abaa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807abab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807abac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807abad0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0000807abae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Thread T2 (Tracy Profiler) created by T0 (Main) here:
#0 0x50662c in pthread_create (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x50662c)
#1 0x302a91e in tracy::Thread::Thread(void (*)(void*), void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyThread.hpp:69:9
#2 0x2eea382 in tracy::Profiler::SpawnWorkerThreads() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1408:19
#3 0x2ee5339 in tracy::Profiler::Profiler() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1392:5
#4 0x49f6a2 in __cxx_global_var_init.11 /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1304:33
#5 0x49f767 in _GLOBAL__I_000105 /tmp/_fbuild.tmp/0x8e51cc6f/core_13/21F99A05/TracyClient.cpp
#6 0x32ea70c in __libc_csu_init (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x32ea70c)
==856==ABORTING
Is your program dynamically unloading libraries?
No, we don't do any dynamic loading or unloading.
Interesting. The callstack you see is basically doing a memcpy of a memory fragment contained in the memory mapping of the program executable image, or mappings of loaded libraries. I do not see how this could be connected to the memory window in the server UI, this feels like a coincidence. I have also tried profiling a program with address sanitizer enabled and it is working for as expected (i.e. profiler shows the binary code of the instrumentation, so I know the sanitizer is working).
Generally speaking, I have seen significant differences in sanitizer behavior between gcc and clang, and probably also from one version to another. It may so be that the sanitizer you are using treats access to the executable memory area as invalid. After all, this is not something a program typically would do, this is not user-allocated memory, etc.
You may disable executable code transfer with a compile-time define, please refer to the manual to see how to do so.
Here's another one of these I encountered -- it's pointing to a global struct in sqlite (which is definitely statically linked and wouldn't be occupying someone else's state), but the read is oddly large (832 bytes).
==1717==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000005386220 at pc 0x00000051b327 bp 0x7fffef8a8bd0 sp 0x7fffef8a8398
READ of size 832 at 0x000005386220 thread T2 (Tracy Profiler)
#0 0x51b326 in __asan_memcpy (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x51b326)
#1 0x2f24a4b in tracy::Profiler::AppendDataUnsafe(void const*, unsigned long) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyProfiler.hpp:781:9
#2 0x2f24a4b in tracy::Profiler::SendLongString(unsigned long, char const*, unsigned long, tracy::QueueType) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3016:5
#3 0x2f2e346 in tracy::Profiler::HandleSymbolCodeQuery(unsigned long, unsigned int) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3807:9
#4 0x2f0b7ea in tracy::Profiler::HandleServerQuery() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:3378:9
#5 0x2f014a3 in tracy::Profiler::Worker() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1867:30
#6 0x302a983 in tracy::Profiler::LaunchWorker(void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyProfiler.hpp:739:63
#7 0x303e405 in tracy::Thread::Launch(void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyThread.hpp:80:40
#8 0x7ffff7be2608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#9 0x7ffff7adb292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
0x000005386220 is located 0 bytes to the right of global variable 'aVfs' defined in '/home/jorgenpt/proj-main/SpryFox/Common/Source/Ozy/src/SQLiteCpp/sqlite3/sqlite3.c:41855:22' (0x5385f80) of size 672
SUMMARY: AddressSanitizer: global-buffer-overflow (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x51b326) in __asan_memcpy
Shadow bytes around the buggy address:
0x000080a68bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000080a68c40: 00 00 00 00[f9]f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
0x000080a68c50: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
0x000080a68c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000080a68c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Thread T2 (Tracy Profiler) created by T0 (Main) here:
#0 0x50662c in pthread_create (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x50662c)
#1 0x302a91e in tracy::Thread::Thread(void (*)(void*), void*) /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/common/../client/TracyThread.hpp:69:9
#2 0x2eea382 in tracy::Profiler::SpawnWorkerThreads() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1408:19
#3 0x2ee5339 in tracy::Profiler::Profiler() /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1392:5
#4 0x49f6a2 in __cxx_global_var_init.11 /home/jorgenpt/proj-main/SpryFox/Tracy/Source/tracy/client/TracyProfiler.cpp:1304:33
#5 0x49f767 in _GLOBAL__I_000105 /tmp/_fbuild.tmp/0x8e51cc6f/core_13/21F99A05/TracyClient.cpp
#6 0x32ea70c in __libc_csu_init (/home/jorgenpt/proj-main/Project/Binaries/Linux/ServerDebug+0x32ea70c)
==1717==ABORTING
Inspecting the pointer we're trying to memcpy, it does seem like the address is in fact well-formed and pointing to the VFS struct in sqlite:
frame #5: 0x0000000002f24a4c ServerDebug`tracy::Profiler::SendLongString(unsigned long, char const*, unsigned long, tracy::QueueType) [inlined] tracy::Profiler::AppendDataUnsafe(this=0x0000000005f7c240, data=0x0000000005385f80, len=832) at TracyProfiler.hpp:781:9
778
779 tracy_force_inline void AppendDataUnsafe( const void* data, size_t len )
780 {
-> 781 memcpy( m_buffer + m_bufferOffset, data, len );
782 m_bufferOffset += int( len );
783 }
784
(lldb) p data
(const void *) $12 = 0x0000000005385f80
(lldb) p *(sqlite3_vfs*)0x0000000005385f80
(sqlite3_vfs) $5 = {
iVersion = 3
szOsFile = 120
mxPathname = 512
pNext = 0x0000000005386178
zName = 0x0000000003f430a0 "unix"
pAppData = 0x0000000003f430c0
xOpen = 0x0000000002756fa0 (ServerDebug`unixOpen at sqlite3.c:39876)
xDelete = 0x00000000027583e0 (ServerDebug`unixDelete at sqlite3.c:40148)
xAccess = 0x0000000002758980 (ServerDebug`unixAccess at sqlite3.c:40197)
xFullPathname = 0x0000000002758f20 (ServerDebug`unixFullPathname at sqlite3.c:40313)
xDlOpen = 0x0000000002759d20 (ServerDebug`unixDlOpen at sqlite3.c:40400)
xDlError = 0x0000000002759d50 (ServerDebug`unixDlError at sqlite3.c:40412)
xDlSym = 0x0000000002759db0 (ServerDebug`unixDlSym at sqlite3.c:40422)
xDlClose = 0x0000000002759df0 (ServerDebug`unixDlClose at sqlite3.c:40445)
xRandomness = 0x0000000002759e30 (ServerDebug`unixRandomness at sqlite3.c:40459)
xSleep = 0x000000000275a2a0 (ServerDebug`unixSleep at sqlite3.c:40506)
xCurrentTime = 0x000000000275a370 (ServerDebug`unixCurrentTime at sqlite3.c:40579)
xGetLastError = 0x000000000275a570 (ServerDebug`unixGetLastError at sqlite3.c:40597)
xCurrentTimeInt64 = 0x000000000275a610 (ServerDebug`unixCurrentTimeInt64 at sqlite3.c:40547)
xSetSystemCall = 0x000000000275a950 (ServerDebug`unixSetSystemCall at sqlite3.c:34437)
xGetSystemCall = 0x000000000275b690 (ServerDebug`unixGetSystemCall at sqlite3.c:34479)
xNextSystemCall = 0x000000000275b9b0 (ServerDebug`unixNextSystemCall at sqlite3.c:34495)
}
Which seems like a weird pointer to get back from the backtrace lookup?
but the read is oddly large (832 bytes).
This is what you could reasonably expect from a symbol code fragment.
Which seems like a weird pointer to get back from the backtrace lookup?
It is weird. Program data structures are in no way supposed to be transferred.
It looks like DecodeCallstackPtr
is getting some PC that's not an actual PC. I tried adding -fno-omit-frame-pointer
to see if that would help, but didn't seem to do anything. backtrace_pcinfo
returns useless information (an example is (tracy::CallstackEntry) $1 = (name = "[unknown] + 74771456", file = "[unknown]", line = 0, symLen = 0, symAddr = 0)
), and then backtrace_syminfo
comes back with some symbol that matches the address given, but a size that's larger than the actual contents of the symbol.
Out of curiosity, is the backtrace_syminfo
call only needed to get the size of the function? It looks like the rest of the data is returned by backtrace_pcinfo
as well. Could the call be conditionalized on backtrace_pcinfo
getting a non-zero lowaddr
?
I think it would be best to focus on why you are getting wrong PCs, rather than on workarounds.
I'm going to try to call DecodeCallstackPtr
in Callstack
, which of course is gonna be really slow, but maybe I can get it to trigger the ASan failure when it's got the actual callstack there. Any suggestions for other ways to dig in to what's going on here?
@wolfpld I'm having a little trouble tracking down where this PC is coming from -- we're not building with TRACY_CALLSTACK
and we never call any of the Tracy*S
macros, what else causes the client to collect instruction pointers that it sends to the server?
Further digging in, this is the callchain from the perf events in TracySysTrace.cpp. I added some debugging code to it to symbolicate it in-line, and it tends to hit the ASan error pretty quickly. Some thousands of perf events will symbolicate fine, then we'll receive one where the entire callchain is useless -- one example has 4 entries where symname
is nullptr, then two in STL, and then a pointer to some data in the process. The one named .str.33
is the one with the bogus data.
(lldb) p/x trace[1]
(unsigned long) $8 = 0x00007ffff7a358f2
(lldb) p/x trace[2]
(unsigned long) $10 = 0x00007ffff7a4811a
(lldb) p/x trace[3]
(unsigned long) $11 = 0x00007fffe4a0fda0
(lldb) p/x trace[4]
(unsigned long) $12 = 0x00007fffe4a0fe20
(lldb) p/x trace[5]
(unsigned long) $13 = 0x0000000001b278cc
(lldb) p/x trace[6]
(unsigned long) $14 = 0x0000000002a74c3b
(lldb) p/x trace[7]
(unsigned long) $16 = 0x00000000046c1340
(lldb) p/x trace[8]
(unsigned long) $17 = 0x00007fffe4a10198
(lldb) p/x trace[9]
(unsigned long) $18 = 0x00007fffe4a10198
and their respective symnames:
[0] = 0x0000000000000000
[1] = 0x0000000000000000
[2] = 0x0000000000000000
[3] = 0x0000000000000000
[4] = 0x0000608000000820 "_ZNSt8functionIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvEED2Ev"
[5] = 0x0000603000024910 "_ZNSt7promiseIbE9set_valueEOb"
[6] = 0x00006020000038f0 ".str.33"
[7] = 0x0000000000000000
[8] = 0x0000000000000000
I also had this reproduce on a real Ubuntu box (not WSL, like most my tests, so it's not just a perf event bug in WSL), where I got the following when it triggered:
(lldb) p/x trace[1]
(unsigned long) $6 = 0x000000000301882b
(lldb) p/x trace[2]
(unsigned long) $7 = 0x00007ffff79f3648
(lldb) p/x trace[3]
(unsigned long) $8 = 0x0000000000052658
(lldb) p/x trace[4]
(unsigned long) $9 = 0x0000000003f69c00
(lldb) p/x trace[5]
(unsigned long) $10 = 0x0000000003f69c00
and their respective symnames:
[0] = 0x00006040001eb310 "_ZN5tracyL17elf_symbol_searchEPKvS1_"
[1] = 0x000060200015cdf0 "bsearch"
[2] = 0x000060200015cd90 "sqlite3CtypeMap"
[3] = 0x000060200015cd70 "sqlite3CtypeMap"
[4] = 0x00006030001d2bd0 "_ZN5tracy6Thread6LaunchEPv"
Not entirely sure what could cause the data from the perf events to get corrupted. :/
Any suggestions for further investigation?
A similar problem may have been recently described by Raymond Chen: https://devblogs.microsoft.com/oldnewthing/20220527-00/?p=106689
There is some memory mapping checking already done in Tracy, but only on Android. Maybe it could be extended to include cases such as the one you have encountered?
Commit 970468f9 disables dlclose()
calls (on the condition that the linker can see the symbol, so the call doesn't become a reference to a symbol in libdl
). It may be wortwhile to check if this change fixes your issue.
(Compiling with -DTRACY_VERBOSE
will print which shared object were supposed to be unloaded.)