MINGW-packages Exception handling is broken on mingw32 when using static runtime libraries.

Exception handling is broken on mingw32 when using static runtime libraries.

Open rvogg opened this issue 2 years ago • 40 comments

If you build ccache >= 4.2 for mingw32, it crashes at runtime. This bug started to occur when static linking was enabled for the gcc runtime libraries. (https://github.com/ccache/ccache/pull/732)

The error almost always occurs in the same place: In Util.cpp read_file rais an exception:

std::string
read_file(const std::string& path, size_t size_hint)
{
  if (size_hint == 0) {
    auto stat = Stat::stat(path);
    if (!stat) {
>>    throw Error(strerror(errno));
    }
    size_hint = stat.size();
  }

But the catch block in the calling function is never reached.

std::string data;
  try {
>>  data = Util::read_file(path);
  } catch (const Error&) {
    // Ignore.
    return counters;
  }

Here is the callstack of "ccache -s" :

msvcrt.dll!msvcrt!_exit (Unbekannte Quelle:0)
msvcrt.dll!msvcrt!abort (Unbekannte Quelle:0)
uw_init_context_1(struct _Unwind_Context * context, void * outer_cfa, void * outer_ra) (c:\_\M\mingw-w64-gcc\src\gcc-10.3.0\libgcc\unwind-dw2.c:1593)
_Unwind_RaiseException(struct _Unwind_Exception * exc) (c:\_\M\mingw-w64-gcc\src\gcc-10.3.0\libgcc\unwind.inc:93)
__cxxabiv1::__cxa_throw(void * obj, std::type_info * tinfo, void (*)(void *) dest) (c:\_\M\mingw-w64-gcc\src\gcc-10.3.0\libstdc++-v3\libsupc++\eh_throw.cc:90)
Util::read_file(const std::string & path, size_t size_hint) (...\cache-4.3\src\Util.cpp:1164)
Statistics::read(const std::string & path) (...\cache-4.3\src\Statistics.cpp:196)
operator()(const struct {...} * const __closure, const std::string & path) (...\cache-4.3\src\Statistics.cpp:102)
std::__invoke_impl<void, collect_counters(const Config&)::<lambda(const string&)>&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(std::__invoke_other, struct {...} &)(struct {...} & __f) (c:\msys64\mingw32\include\c++\10.3.0\bits\invoke.h:60)
std::__invoke_r<void, collect_counters(const Config&)::<lambda(const string&)>&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(struct {...} &)(struct {...} & __fn) (c:\msys64\mingw32\include\c++\10.3.0\bits\invoke.h:153)
std::_Function_handler<void(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), collect_counters(const Config&)::<lambda(const string&)> >::_M_invoke(const std::_Any_data &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > &)(const std::_Any_data & __functor,  __args#0) (c:\msys64\mingw32\include\c++\10.3.0\bits\std_function.h:291)
std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const(const std::function<void(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> * const this,  __args#0) (c:\msys64\mingw32\include\c++\10.3.0\bits\std_function.h:622)
for_each_level_1_and_2_stats_file(const std::string &, std::function<void(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>)(const std::string & cache_dir, const std::function<void(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> function) (...\cache-4.3\src\Statistics.cpp:84)
collect_counters(const Config & config) (...\cache-4.3\src\Statistics.cpp:99)
Statistics::format_human_readable[abi:cxx11](Config const&)(const Config & config) (...\cache-4.3\src\Statistics.cpp:281)
handle_main_options(int argc, const char * const * argv) (...\cache-4.3\src\ccache.cpp:2772)
ccache_main(int argc, const char * const * argv) (...\cache-4.3\src\ccache.cpp:2839)
main(int argc, char * const * argv) (...\cache-4.3\src\main.cpp:24)

I can't generate a minimal example for this bug. But, since the prebuilt 32 bit versions of ccache do not have this bug I assume that it is caused by the mingw32 environment and not by ccache.

Aug 04 '21 22:08 rvogg

is this attempting to throw across a module boundary? That can be an issue, but doesn't appear to be the case here as far as I can see from the backtrace.

Aug 05 '21 03:08 jeremyd2019

hmm guess this is what is also hitting a gcc clang build if trying to link to static libgcc it will fail to build.

Aug 05 '21 05:08 revelator

is this attempting to throw across a module boundary? That can be an issue, but doesn't appear to be the case here as far as I can see from the backtrace.

As I said, I tried to create a minimal example where I also threw exceptions over library boundaries. But I could not reproduce the error there. I suspect that ccache makes a system call that breaks the exception handling.

Unfortunately I don't understand enough about exception handling models like dwarf to be able to isolate the issue better.

Aug 05 '21 07:08 rvogg

Seems like this also affects #9088 I purged -static-libgcc from ldflags and the build works without additional changes/patches Without this I was consistently getting ICE on CI

Aug 05 '21 09:08 Astrum-polaris

guess we should report this upstream, thats a major breakage :O

Aug 06 '21 09:08 revelator

this also affected the tdm build i was maintaining, and it seems to go back further than i thought. first time this cropped up problems with the tdm builds was gcc-8 and it slowly got worse with the newer gcc versions. at first it was only the 32 bit compiler which occasionally would bail on code that worked before, but later it would also fail with static exceptions on the 64 bit code. The funny thing about the tdm builds is that they make use of code to allow throwing exceptions across dll boundaries even when linked to the static exception runtimes, so this kinda sucked because before all these problems i could actually build a gcc version of clang that did not rely on the libgcc and libstdc++ dll's, this is now impossible unfortuantly.

Aug 16 '21 16:08 revelator

Not sure if there is any correlation, but there was a similar problem in ccache with the MIPS toolchain when using the gold linker instead of the bfd linker: https://github.com/ccache/ccache/issues/907

Aug 17 '21 12:08 jrosdahl

wow thats quite a problem :S

Aug 24 '21 10:08 revelator

I was running in the same or similar bug during the debugging of ccache built with 64 bit gcc.
A system test failed with an exception. So I tried to to debug it with gdb. But it does not hit the expected exception, with attached debugger the process was died at the same point as the 32-bit version, however in contrast with an error message terminate called after throwing an instance of 'core::Error'.

After that, I took some time to look at the bug more closely, and the behavior gets weirder as I looked on it.

32-bit with static linking

The process died on the following assert, without any error message:

    File: unwind-dw2.c
    1578: static void __attribute__((noinline))
    1579: uw_init_context_1 (struct _Unwind_Context *context,
    1580: 		   void *outer_cfa, void *outer_ra)
    1581: {
    1582:   void *ra = __builtin_extract_return_addr (__builtin_return_address (0));
    1583:   _Unwind_FrameState fs;
    1584:   _Unwind_SpTmp sp_slot;
    1585:   _Unwind_Reason_Code code;
    1586: 
    1587:   memset (context, 0, sizeof (struct _Unwind_Context));
    1588:   context->ra = ra;
    1589:   if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
    1590:     context->flags = EXTENDED_CONTEXT_BIT;
    1591: 
    1592:   code = uw_frame_state_for (context, &fs);
>>  1593:   gcc_assert (code == _URC_NO_REASON);
    1594:

The reason is located in the _Unwind_Find_FDE function, the both pointer seen_objects and unseen_objects are a nullptr:

    File: unwind-dw2-fde.c
    1029: const fde *
    1030: _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
    1031: {
    ...
    1051:   /* Linear search through the classified objects, to find the one
    1052:      containing the pc.  Note that pc_begin is sorted descending, and
    1053:      we expect objects to be non-overlapping.  */
>>  1054:   for (ob = seen_objects; ob; ob = ob->next)
    ...
    1061:       }
    1062: 
    1063:   /* Classify and search the objects we've not yet processed.  */
>>  1064:   while ((ob = unseen_objects))
    1065:     {
    ...
    1078:       if (f)
    1079: 	goto fini;
    1080:     }

I tried using a memory breakpoint to see if these pointers are ever set, but couldn't see it.

64-bit static/dynamic

I was searching for the reason for a fmt::v7::format_error. After I attatched the debugger to the 64-bit version, the error message was now a core::Error instead of the fmt::v7::format_error.

The position of the exit seems to be the normal place when no catch block was found:

    File: eh_throw.cc
    74: extern "C" void
    75: __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
    76: 			 void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
    77: {
    78:   PROBE2 (throw, obj, tinfo);
    79: 
    80:   __cxa_eh_globals *globals = __cxa_get_globals ();
    81:   globals->uncaughtExceptions += 1;
    82:   // Definitely a primary.
    83:   __cxa_refcounted_exception *header =
    84:     __cxa_init_primary_exception(obj, tinfo, dest);
    85:   header->referenceCount = 1;
    86: 
    87: #ifdef __USING_SJLJ_EXCEPTIONS__
    88:   _Unwind_SjLj_RaiseException (&header->exc.unwindHeader);
    89: #else
    90:   _Unwind_RaiseException (&header->exc.unwindHeader);
    91: #endif
    92: 
    93:   // Some sort of unwinding error.  Note that terminate is a handler.
    94:   __cxa_begin_catch (&header->exc.unwindHeader);
>>  95:   std::terminate ();
    96: }

32-bit dynamic

Then I thought I'll debug the dynamic 32-bit version, but the gcc-libs havn't any symbols. So I built the gcc packages locally (first without making any changes to the PKGBUILD) and installed the gcc-libs package.

Now everything was broken!

Every process crashed at startup and it was impossible to debug any process (even with the 64-bit multiarch gdb).

32-bit static/dynamic with new gcc

I thought that the reason for the strange behavior was in the local build of the gcc libs, so I triggered a github action to build a new gcc package. Then I installed the new packages together with cmake and ninja in an empty environment:

pacman --root new_root -Sy
pacman --root new_root -U  mingw-w64-i686-gcc* mingw-w64-i686-libgccjit*
pacman --root new_root -S  mingw-w64-i686-cmake mingw-w64-i686-ninja

After that I started a cmd, added the new environment as the only entry in the PATH variable and built the ccache project.

With static linking it was the same behavior as befor but with dynamic linking it was terminating with the following error message: terminate called after throwing an instance of 'core::Error'

After several tries to debug the behavior, it seems to be impossible to reproduce it with an attached debugger. Therefore I have enabled the JIT degugger to debug it. As far as I understood the code there, it was the normal place when no catch block was found.

But for me it was also not possible to detect the reason for this behavior. Maybe i can post a callstack when I setup the JIT debugger again.

I'm not sure what is in the current gcc package, but it is not reproducible locally or on github.

Sep 01 '21 23:09 rvogg

It's now broken in the dynamic case too -> #9771

As you correctly predicted I guess

Oct 15 '21 11:10 lazka

I tried rebuilding ccache without -DSTATIC_LINK=OFF with the rebuild gcc, and it still seems to fail as originally described. Ooh, it's pulling in libgcc_s_dw2-1.dll via libhiredis.dll, I figure that's probably messing things up. (it's also coming via libzstd.dll)

Oct 18 '21 02:10 jeremyd2019

Minimal reproducer:

#include <stdio.h>
#include <zstd.h>

int main()
{
        try
        {
                printf("About to throw\n");
#ifdef BREAK_EXCEPTIONS
                printf("Calling zstd: %u\n", ZSTD_versionNumber());
#endif
                throw 42;
                printf("After throw (unreachable)\n");
        }
        catch (...)
        {
                printf("Caught\n");
                return 1;
        }
        return 0;
}

$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd
$ ./testexc
About to throw
Caught
$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd -DBREAK_EXCEPTIONS
$ ./testexc
About to throw
Calling zstd: 10500

I chose zstd pretty arbitrarily, it could be any DLL that's dynamically linked to libgcc.

Oct 18 '21 02:10 jeremyd2019

Interestingly, it works with either -static-libgcc, -static-libstdc++, or neither, but breaks with both

Oct 18 '21 03:10 jeremyd2019

We have fixed some of unwinding issues, does this still reproduce?

Mar 09 '22 16:03 mati865

$ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd $ ./testexc About to throw Caught $ g++ -static-libgcc -static-libstdc++ -o testexc.exe testexc.cpp -lzstd -DBREAK_EXCEPTIONS $ ./testexc About to throw Calling zstd: 10500

I still get the same result with gcc 11.3.0

May 18 '22 20:05 lazka

Here is an upstream comment on this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105507#c5

May 19 '22 06:05 lazka

testexc.zip

these built with my TDM based toolset, exceptions work again guess it all came down to the grep problem we had.

Oct 15 '22 03:10 revelator

MINGW-packages MINGW-packages copied to clipboard

Exception handling is broken on mingw32 when using static runtime libraries.

32-bit with static linking

64-bit static/dynamic

32-bit dynamic

32-bit static/dynamic with new gcc

MINGW-packages
MINGW-packages copied to clipboard