phylanx icon indicating copy to clipboard operation
phylanx copied to clipboard

physl running with the address sanitizer

Open stevenrbrandt opened this issue 6 years ago • 14 comments

bash-4.4$ physl --help
=================================================================
==6==ERROR: AddressSanitizer: odr-violation (0x7f9db34ff8c0):
  [1] size=32 'hpx::util::detail::global_fixture' /hpx/src/util/lightweight_test.cpp:56:13
  [2] size=32 'hpx::util::detail::global_fixture' /hpx/src/util/lightweight_test.cpp:56:13
These globals were registered at these points:
  [1]:
    #0 0x7f9dc4620660  (/usr/lib64/clang/7.0.1/lib/libclang_rt.asan-x86_64.so+0x62660)
    #1 0x7f9db2b9a48d  (/usr/local/lib/phylanx/libphylanx_controlsd.so+0x86948d)

  [2]:
    #0 0x7f9dc4620660  (/usr/lib64/clang/7.0.1/lib/libclang_rt.asan-x86_64.so+0x62660)
    #1 0x7f9db33b094d  (/usr/local/lib/phylanx/libphylanx_statisticsd.so+0x61494d)

==6==HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_odr_violation=0
SUMMARY: AddressSanitizer: odr-violation: global 'hpx::util::detail::global_fixture' at /hpx/src/util/lightweight_test.cpp:56:13
==6==ABORTING

stevenrbrandt avatar Jan 23 '19 23:01 stevenrbrandt

If I disable odr violation detection, I get this from physl --help

...
=100==Processing thread 54.
==100==Stack at 0x7ffddf26c000-0x7ffddfa6c000 (SP = 0x7ffddfa6a7c8).
==100==TLS at 0x7f6697fe7b40-0x7f6697fe8c40.
Tracer caught signal 11: addr=0x62700000f000 pc=0x7f66a5cff868 sp=0x7f6678534c40
==54==LeakSanitizer has encountered a fatal error.
==54==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==54==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

stevenrbrandt avatar Jan 23 '19 23:01 stevenrbrandt

I think the reported ODR violation is benign in our context, I will have a look, though. No idea why it crashes however. What does it report if you set the recommended options (LSAN_OPTIONS=verbosity=1:log_threads=1)?

hkaiser avatar Jan 23 '19 23:01 hkaiser

I had to set those options to see the signal 11. It just prints that message regardless.

stevenrbrandt avatar Jan 23 '19 23:01 stevenrbrandt

@stevenrbrandt thanks! The output however is not too useful :/ Is there anything reported otherwise that might shed some light on what's going on?

hkaiser avatar Jan 24 '19 17:01 hkaiser

@hkaiser I made a new version of the sanitizer image with the llvm-symbolizer. Alas, it told me nothing more. addr2line for the address reported ??:0.

stevenrbrandt avatar Jan 29 '19 16:01 stevenrbrandt

A slightly larger physl program,

block(
define(fib,n
  if(n < 2,n,
    fib(n-1)+fib(n-2)
  )),
cout(fib(3)))

Produced a more coherent error message

==158==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7f6d18e3df00; bottom 0x7f6d072dc000; size: 0x000011b61f00 (297148160)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
=================================================================
==158==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f6d072e6628 at pc 0x7f6d4719cf41 bp 0x7f6d072e64f0 sp 0x7f6d072e64e8
WRITE of size 8 at 0x7f6d072e6628 thread T10
    #0 0x7f6d4719cf40 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>::fail_function(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/detail/fail_function.hpp:28:13
    #1 0x7f6d4719aa43 in boost::spirit::qi::detail::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type> boost::spirit::qi::sequence<boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::cons<boost::spirit::qi::kleene<boost::spirit::qi::difference<boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::char_, boost::spirit::char_encoding::standard> >, boost::spirit::qi::literal_string<char const (&) [3], true> > >, boost::fusion::cons<boost::spirit::qi::literal_string<char const (&) [3], true>, boost::fusion::nil_> > > >::fail_function<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >, boost::spirit::unused_type>(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::spirit::context<boost::fusion::cons<boost::spirit::unused_type&, boost::fusion::nil_>, boost::fusion::vector<> >&, boost::spirit::unused_type const&) /usr/include/boost/spirit/home/qi/operator/sequence.hpp:51:20
...

However, the "false positives" warning has me concerned that maybe ASAN can't help us because of stack switching.

stevenrbrandt avatar Jan 29 '19 16:01 stevenrbrandt

@sithhell is that error message above something you have seen with your asan runs?

hkaiser avatar Jan 29 '19 22:01 hkaiser

@sithhell , Hartmut tells me you don't see the problems I've seen. Here's how I set up the sanitizer: https://gist.github.com/stevenrbrandt/131d89d1a6bf99810bc56394818bf3c1

I'd be interested in knowing what I've done wrong. :)

stevenrbrandt avatar Jan 30 '19 22:01 stevenrbrandt

@hkaiser @sithhell I ran the fib(3) code on a machine with 40 cores (80 threads). I decided to try on Rostam, and found that the stack-buffer-overflow errors go away. The segfault on shutdown is still there, though.

stevenrbrandt avatar Jan 30 '19 22:01 stevenrbrandt

@stevenrbrandt the stack switching is perfectly fine with asan, there are no false positives there. The errors you are seeing are genuine bugs on our side. If they are severe or not is a different issue. Spirit causing stack overflows makes perfect sense given its recursive decent parsing nature. With that being said, the stack overflow probably depends on how many recursive function calls there actually are. The stack traces coming out of ASAN are always very helpful and very precise. So this should give us some idea. I haven't seen problems because I haven't run any complicated physl code with asan yet. The stack overflow bug is most likely going to go away if you increase the stack size.

sithhell avatar Jan 31 '19 21:01 sithhell

@sithhell The issue appears related to something in my environment, not the number of threads on the machine, as running that same image with Singularity did not produce an issue. Regardless, though, I always see the segfault at shutdown with ASAN.

stevenrbrandt avatar Feb 01 '19 18:02 stevenrbrandt

@stevenrbrandt right, the only way to get rid of the segfault right now is to disable the leaksanitizer by setting the environment ASAN_OPTIONS=detect_leaks=0. The address sanitizer features like heap use after free or stack overflows are still enabled with that.

sithhell avatar Feb 01 '19 20:02 sithhell

@stevenrbrandt if you configure HPX with -DHPX_WITH_STACKOVERFLOW_DETECTION=Off, the leak sanitizer segfault goes away.

sithhell avatar Feb 06 '19 16:02 sithhell

DHPX_WITH_STACKOVERFLOW_DETECTION

can you please share about, how can I config HPX on a docker

dheerajka29 avatar Aug 18 '20 10:08 dheerajka29