Retro68
Retro68 copied to clipboard
Inexplicable crashes when -Wl,--gc-sections is used without -ffunction-sections
I think there may be a problem with the Retro68 relocation code because the application I'm developing is crashing in inexplicable ways and the behavior is consistent but varies depending on the processor and the CMake build type. My app is a typical Macintosh Toolbox multiple document window app; it's not trying to do anything weird.
The problem didn't start until I added a third-party library, though this library is the entire reason I'm writing this app, so its use is rather important. This library increased the size of the app considerably so I suspect the problem may relate to the size of the code being relocated or the distance between a function and the code it's being called from. The third-party library is being compiled from source along with my app by Retro68. The third-party library is written in C++11, and it uses another bundled library written in C99. Sometimes the crash is in the C++11 library's code. Sometimes it's in the C99 library's code. Sometimes the crash occurs before my main function. Sometimes my app quits immediately after launch with no error message. Sometimes the program counter ends up set to an odd address which of course should never happen. Sometimes my app works fine. It's not an out-of-memory problem or a disk-full problem.
With my current codebase, here's what happens with various configurations:
Debug | MinSizeRel | Release | RelWithDebInfo | |
---|---|---|---|---|
Power Macintosh G3 Mac OS 9.2.2 (PPC) | 💣 PowerPC unmapped memory exception in C99 lib code | ✅ works fine | ✅ works fine | ✅ works fine |
Power Macintosh G3 Mac OS 9.2.2 (68K) | 💣 immediate quit before main , no error |
💣 bus error somewhere after main ; PC is odd |
💣 illegal instruction in C++11 lib code; PC is even | 💣 illegal instruction in C++11 lib code; PC is even |
QEMU Mac Quadra 800 Mac OS 7.6.1 | 💣 immediate quit before main , no error |
💣 address error in C++11 lib code; PC is odd | 💣 bus error in C++11 lib code; PC is even | 💣 bus error in C++11 lib code; PC is even |
QEMU Mac Quadra 800 System 7.1 | 💣 immediate quit before main , no error |
💣 address error in C++11 lib code; PC is even | 💣 bus error somewhere after main ; PC is even |
💣 bus error somewhere after main ; PC is odd |
Mini vMac Mac II System 7.1 | 💣 unimplemented instruction before main ; PC is even |
💣 illegal instruction in C++11 lib code; PC is odd | 💣 hang before main somewhere in ROM code |
💣 hang before main somewhere in ROM code |
Each test was performed after a system restart. All builds were with:
set_target_properties(app PROPERTIES
COMPILE_OPTIONS "-fdata-sections;-ffunction-sections;-Werror"
LINK_FLAGS "-Wl,--gc-sections"
)
The application size ranges from 1.2 MB (68K MinSizeRel) to 3.5 MB (PPC Debug).
I'd be willing to consider that there's a bug in my code or in these unfamiliar libraries, but it's hard to see how that could cause a crash before main
. The unimplemented instruction seen with the Debug build on the Mini vMac Mac II is CPUSHA
, a cache instruction introduced on the 68040, which explains why the error message differs between the 68020 Mac II and the 68040 Quadra, but it doesn't explain why Retro68 would be trying to use that instruction in the first place.
Are there any techniques I can use to try to debug this further?
This sounds like an interesting challenge...
Let's assume it's the compiler toolchain's fault for now - though theoretically, it could also be global constructor code from the library that is the culprit.
First ideas:
You could try the -Wl,--mac-single
linker flag to shake things up (multi-segment apps are harder, so higher chance of bugs there).
You could try the sc6 and sc7 commands in macsbug to see if it gives a hint on where we were before we crashed.
Thanks for the response and for the suggestions! I had been using sc
before (which I understand is the same as sc6
) but I didn't know about sc7
. I'll reply with those results later but first some good news.
I tried #ifdef
ing out my code that used the libraries and not linking them in. No more crashes. I also wanted to try linking in the libraries without using them, but it looks like the linker is too clever and doesn't include the libraries in the executable since it sees I'm not actually using them—even if I remove -fdata-sections
and -ffunction-sections
and -Wl,--gc-sections
.
It was during this experiment that I realized that the 68K crashes go away if I just remove -Wl,--gc-sections
! I assume that I would like to be able to use that though since it is mentioned in several places like in the HelloWorld sample. Keeping -Wl,--gc-sections
but removing -fdata-sections
and -ffunction-sections
still crashed, though in a different way. (It tried to execute the instruction JSR A6
which isn't an addressing mode that JSR
supports.)
And I'm now willing to say that the hang in ROM code I mentioned above for some Mini vMac II System 7.1 builds is a problem in one of the libraries I'm using. The "hang" was in the memory manager routines and was not a hang; the library just seems to be asking for way too much stuff to be moved around in memory, and it gets exponentially worse the more data I ask it to process. Greatly reducing the size of my already-small test case got past that issue, though with -Wl,--gc-sections
there was still a crash shortly thereafter. I will need to investigate the library more closely. Is there a good way to get profiling information with Retro68 or will I have to add my own rudimentary logging to the library to see what it's calling most?
No change so far on the crash in the PPC Debug build.
The "hang" was in the memory manager routines
This was caused by the poor performance of malloc
which is now filed as #185.
All builds were with:
set_target_properties(app PROPERTIES COMPILE_OPTIONS "-fdata-sections;-ffunction-sections;-Werror" LINK_FLAGS "-Wl,--gc-sections" )
Notably, -fdata-sections
and -ffunction-sections
were only being added here, for my application's target. They were not being added for any other targets in my project, such as the target for the third-party library. Once I added these flags to the third-party library's target as well, the crashes went away. I'm now able to use -Wl,--gc-sections
without crashes again. I'm only testing on the QEMU Quadra 800 with System 7.1 right now, but I did verify that all four CMake build types work without crashes.
Is this expected? When using -Wl,--gc-sections
, is it required to use -fdata-sections
and/or -ffunction-sections
when compiling every single object and static library that will be linked into the application? If so, is that documented anywhere? And could the code (the linker?) be changed to error out (instead of generating a crashing executable) if that requirement is not met?
Or, now that we know more about what caused the crashes, do we have a hope of fixing it? Would the bug be in gcc's ld or in Retro68's Elf2Mac wrapper around ld?
Certainly if one is trying to reduce the size of the executable it is desirable to use -fdata-sections
and/or -ffunction-sections
for every file, but I was misled by what I thought were CMake best practices, which is to define flags for specific targets only, using set_target_properties
(as several of the Retro68 sample programs do) rather than globally. In the case of these two flags however, it seems it is better to set them globally, e.g. by adding this in one's project's top-level CMakeLists.txt:
add_compile_options(-fdata-sections;-ffunction-sections)
Note however that this will not automatically pass down to targets defined with ExternalProject_Add
which I was using for my third-party library. I had to add the flags there explicitly:
ExternalProject_Add(
...
CMAKE_ARGS
"-DCMAKE_C_FLAGS=-fdata-sections -ffunction-sections"
"-DCMAKE_CXX_FLAGS=-fdata-sections -ffunction-sections"
...
...
)
Earlier I said:
Keeping
-Wl,--gc-sections
but removing-fdata-sections
and-ffunction-sections
still crashed
I've just reconfirmed that this still happens.