pdns icon indicating copy to clipboard operation
pdns copied to clipboard

AddressSanitizer: stack-use-after-scope on address in `RecResponseStats::RecResponseStats`

Open jsoref opened this issue 1 month ago • 8 comments

  • Program: Recursor
  • Issue type: Bug report

Short description

AddressSanitizer: stack-use-after-scope on address in RecResponseStats::RecResponseStats in build recursor (autotools, asan+ubsan, full)

Environment

  • Operating system: Ubuntu 24.04.3 LTS
  • Software version: master
  • Software source: PowerDNS repository

Steps to reproduce

  1. Open #16481
  2. Trigger CI
  3. Get this random error

Expected behaviour

No random errors, just errors caused by my own mistakes (there were a handful here)

Actual behaviour

https://github.com/PowerDNS/pdns/actions/runs/19311542898/job/55232765186?pr=16481#step:13:3175

  ../test-filterpo_cc.cc(286): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  ../test-filterpo_cc.cc(287): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  ../test-filterpo_cc.cc(293): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  ../test-filterpo_cc.cc(294): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  ../test-filterpo_cc.cc(300): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  ../test-filterpo_cc.cc(301): warning: in "test_filter_policies_wildcard_with_enc": Please fix issue #8231
  =================================================================
  ==31205==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f8c6a5a6f00 at pc 0x5651a8b2f74c bp 0x7f8c6a5a2ff0 sp 0x7f8c6a5a27b8
  WRITE of size 2280 at 0x7f8c6a5a6f00 thread T3
      #0 0x5651a8b2f74b in __asan_memset (/__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/testrunner+0xd2074b)
      #1 0x5651a8ef45b5 in RecResponseStats::RecResponseStats() /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../rec-responsestats.hh:73:53
      #2 0x5651a91cb715 in rec::Counters::Counters() /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../rec-tcounters.hh:223:33
      #3 0x5651a91b4725 in pdns::TLocalCounters<rec::Counters>::TLocalCounters(pdns::GlobalCounters<rec::Counters>&, timeval) /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../tcounters.hh:131:3
      #4 0x5651a8a8eac6 in __cxx_global_var_init.10 /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../test-rec-tcounters_cc.cc:19:36
      #5 0x5651a982cf7e in __tls_init /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../test-rec-tcounters_cc.cc
      #6 0x5651a9829a18 in thread-local wrapper routine for tlocal test-rec-tcounters_cc.cc
      #7 0x5651a9829dc6 in test_rec_tcounters_cc::update_fast::test_method()::$_2::operator()() const /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../test-rec-tcounters_cc.cc:53:9
      #8 0x5651a9829dc6 in void std::__invoke_impl<void, test_rec_tcounters_cc::update_fast::test_method()::$_2>(std::__invoke_other, test_rec_tcounters_cc::update_fast::test_method()::$_2&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:61:14
      #9 0x5651a9829c66 in std::__invoke_result<test_rec_tcounters_cc::update_fast::test_method()::$_2>::type std::__invoke<test_rec_tcounters_cc::update_fast::test_method()::$_2>(test_rec_tcounters_cc::update_fast::test_method()::$_2&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/invoke.h:96:14
      #10 0x5651a9829c66 in void std::thread::_Invoker<std::tuple<test_rec_tcounters_cc::update_fast::test_method()::$_2> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:252:13
      #11 0x5651a9829c66 in std::thread::_Invoker<std::tuple<test_rec_tcounters_cc::update_fast::test_method()::$_2> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:259:11
      #12 0x5651a9829c66 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<test_rec_tcounters_cc::update_fast::test_method()::$_2> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_thread.h:210:13
      #13 0x7f8c71bda4a2  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd44a2)
      #14 0x7f8c718ab1f4  (/lib/x86_64-linux-gnu/libc.so.6+0x891f4)
      #15 0x7f8c7192ab3f in clone (/lib/x86_64-linux-gnu/libc.so.6+0x108b3f)
  
  Address 0x7f8c6a5a6f00 is a wild pointer inside of access range of size 0x0000000008e8.
  SUMMARY: AddressSanitizer: stack-use-after-scope (/__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/testrunner+0xd2074b) in __asan_memset
  Shadow bytes around the buggy address:
    0x0ff20d4acd90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4acda0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4acdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4acdc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4acdd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0ff20d4acde0:[f8]f8 f8 f8 00 00 00 00 f8 00 00 00 00 00 00 00
    0x0ff20d4acdf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4ace10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4ace20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ff20d4ace30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07 
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  Thread T3 created by T0 here:
      #0 0x5651a8b1a6bc in pthread_create (/__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/testrunner+0xd0b6bc)
      #1 0x7f8c71bda578 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd4578)
      #2 0x5651a98251ff in test_rec_tcounters_cc::update_fast_invoker() /__w/pdns/pdns/pdns/recursordist/pdns-recursor-0.0.0-git1/../test-rec-tcounters_cc.cc:44:1
      #3 0x5651a931a831 in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:117:11
  
  ==31205==ABORTING
  FAIL testrunner (exit status: 1)

Other information

jsoref avatar Nov 12 '25 21:11 jsoref

This occasionally happens in the testrunner. Both @rgacogne and myself tried to diagnose it, but no success so far. Will revisit.

omoerbeek avatar Nov 13 '25 06:11 omoerbeek

So not able to reproduce locally on either macOS or debian-trixie. I'm trying to replicate the exact runtime, but now I'm running into an issue: the CI base image is debian bookworm with clang-13 and then runs a ubuntu-24.4 based container. Ubuntu 24-4 itself does not have clang-13, so I suppose it used the compiler from the base image?

This makes it harder than needed to reproduce the runtime on a "real" ubuntu VM. I like to be able to use such a setup, since debugging in a container is such a pain.

omoerbeek avatar Nov 13 '25 09:11 omoerbeek

My understanding is that the Docker host runs a ubuntu-24.4 based container but we build and run the unit tests in a Docker container based on the CI base image (debian bookworm) with clang-13. In theory we should get the same behaviour using clang-13 in a bookworm VM, but since the issue seems to be happening randomly this is very annoying to investigate :-/

rgacogne avatar Nov 13 '25 09:11 rgacogne

Ah, thanks, all these virtualization layers got me confused.

omoerbeek avatar Nov 13 '25 10:11 omoerbeek

Running on a bookworm VM, compiled with clang-13 I'm also not able to reproduce so far. Will let the test loop run for a few more hours.

omoerbeek avatar Nov 13 '25 11:11 omoerbeek

Test is still running. I also ran using valgrind which did not spot any issue.

I'll stop investigating this now, but I'll leave the issue open, so we have a better chance of remembering that this is a somewhat rare, but still common enough unit test issue that so far only has been observed in our CI.

omoerbeek avatar Nov 13 '25 12:11 omoerbeek

@cmouse speculated:

i feel like i could understand the issue, it sounds like the stack allocated thread local is being used via reference
probably because of the operator+= overload
but then again, i might be totally wrong
it looks like it's caused by ++tlocal(something)
++tlocal.at(rec::Counter::servFails);
RecResponseStats& operator+=(const RecResponseStats&);

i wonder if this should be (const RecResponseStats)

jsoref avatar Nov 13 '25 12:11 jsoref

That does not make a lot of sense to me. The call on the stack is a += of a counter, not of the whole struct. The issue seems to happen when the first ref to the thread local is done, related to the init of the thread local.

omoerbeek avatar Nov 13 '25 12:11 omoerbeek