LeakSanitiser break on some shutdowns in Daily build
eg. https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=13608&view=results
Tracer caught signal 11: addr=0x0 pc=0x4f4620 sp=0x7f4a60171d30
==1605==LeakSanitizer has encountered a fatal error.
==1605==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==1605==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
Turning on the suggested options does not reveal much more:
2020-10-09T11:54:12.8473188Z 48: ==2952==AddressSanitizer: libc interceptors initialized
2020-10-09T11:54:12.8473533Z 48: || `[0x10007fff8000, 0x7fffffffffff]` || HighMem ||
2020-10-09T11:54:12.8473863Z 48: || `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
2020-10-09T11:54:12.8474199Z 48: || `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap ||
2020-10-09T11:54:12.8474523Z 48: || `[0x00007fff8000, 0x00008fff6fff]` || LowShadow ||
2020-10-09T11:54:12.8474861Z 48: || `[0x000000000000, 0x00007fff7fff]` || LowMem ||
2020-10-09T11:54:12.8475251Z 48: MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
2020-10-09T11:54:12.8475763Z 48: redzone=16
2020-10-09T11:54:12.8475954Z 48: max_redzone=2048
2020-10-09T11:54:12.8476175Z 48: quarantine_size_mb=256M
2020-10-09T11:54:12.8476418Z 48: thread_local_quarantine_size_kb=1024K
2020-10-09T11:54:12.8476660Z 48: malloc_context_size=30
2020-10-09T11:54:12.8476887Z 48: SHADOW_SCALE: 3
2020-10-09T11:54:12.8477185Z 48: SHADOW_GRANULARITY: 8
2020-10-09T11:54:12.8477559Z 48: SHADOW_OFFSET: 0x7fff8000
2020-10-09T11:54:12.8477816Z 48: ==2952==Installed the sigaction for signal 11
2020-10-09T11:54:12.8478107Z 48: ==2952==Installed the sigaction for signal 7
2020-10-09T11:54:12.8478548Z 48: ==2952==Installed the sigaction for signal 8
2020-10-09T11:54:12.8478922Z 48: ==2952==T0: stack [0x7fff4e0dd000,0x7fff4e8dd000) size 0x800000; local=0x7fff4e8db854
2020-10-09T11:54:12.8479251Z 48: ==2952==AddressSanitizer Init done
2020-10-09T11:54:12.8479689Z 48: ==2952==T1: stack [0x7f0edb58a000,0x7f0edbd88f40) size 0x7fef40; local=0x7f0edbd88e34
2020-10-09T11:54:12.8480007Z 48: ==2952==T1 TSDDtor
2020-10-09T11:54:12.8480205Z 48: ==2952==T1 exited
2020-10-09T11:54:12.8480429Z 48: ==3231==Processing thread 2952.
2020-10-09T11:54:12.8481031Z 48: ==3231==Stack at 0x7fff4e0dd000-0x7fff4e8dd000 (SP = 0x7fff4e8db5c8).
2020-10-09T11:54:12.8481574Z 48: ==3231==TLS at 0x7f0eec07f440-0x7f0eec080500.
2020-10-09T11:54:12.8482103Z 48: ==3231==DTLS 4 at 0x1980001020000008-0x1c80000f2000000a.
2020-10-09T11:54:12.8482475Z 48: Tracer caught signal 11: addr=0x0 pc=0x4f4690 sp=0x7f0edb448d30
2020-10-09T11:54:12.8482825Z 48: ==2952==LeakSanitizer has encountered a fatal error.
2020-10-09T11:54:12.8483280Z 48: ==2952==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
2020-10-09T11:54:12.8483946Z 48: ==2952==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
Some potentially helpful information in here: https://github.com/google/sanitizers/issues/723
I don't believe we have any protected memory though.
Also of interest: https://github.com/google/sanitizers/issues/870
It looks like snmalloc may in fact protect some pages when built with USE_POSIX_COMMIT_CHECKS, but that's not the case for us. Let's try ASAN_OPTIONS=fast_unwind_on_malloc=0 and see how slow it is.
No luck with the slow unwind: https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=13938&view=logs&j=383d248c-4494-5797-d98f-6cef5140601e&t=bbcdfa66-260b-5619-96df-9537b476c35e
Issue happens still, and running is so slow that CI times out.
https://www.chromium.org/developers/testing/leaksanitizer suggests we ought to be using a debug version of libc++:
Using debug versions of shared libraries (NOTE: the libstdc++ part is no longer required for Chromium, where we now use a custom libc++ binary for ASan builds.)
Be aware that ASan's fast stack unwinder depends on frame pointers, which are often missing in release versions of shared libraries. If you want to use the fast unwinder (enabled by default), you should at least install a debug version of libstdc++. This worked for us on Ubuntu:
sudo apt-get install libstdc++6-4.6-dbg LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu/debug" ASAN_OPTIONS="detect_leaks=1 strict_memcmp=0" out/Release/base_unittests
If you still see incomplete stack traces, you can disable the fast unwinder by adding fast_unwind_on_malloc=0 to ASAN_OPTIONS.
Addressed in #1776
Much less frequent outside docker, but happened again in https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=14777&view=logs&j=88dd69b5-c778-5f1c-f9ac-1398f4203929&t=f18b20af-6d89-55d5-5278-1002871723a4
45: Tracer caught signal 11: addr=0x0 pc=0x4f46b0 sp=0x7feb2c010d30 45: ==3229==LeakSanitizer has encountered a fatal error. 45: ==3229==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1 45: ==3229==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
Another instance: https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=14858&view=logs&j=88dd69b5-c778-5f1c-f9ac-1398f4203929&t=f18b20af-6d89-55d5-5278-1002871723a4&l=19155
This time in ws_logging_raft, which is the first time I've noticed it in a test that doesn't involve recovery.
@eddyashton if this is resolved, we need to remove https://github.com/microsoft/CCF/blob/5b1c504cdbdd12850a3272dd2e13a64a5026f555/tests/infra/network.py#L153, but my reading of #5160 suggests to me that it's not.
@achamayou - Correct, this is not yet resolved. I tried removing those lines in #5160, but that resulted in test failures as some nodes are still emitting this on shutdown.