elf_diff
elf_diff copied to clipboard
Bug in handling migrated symbols
Describe the bug I have an embedded project for an ARM microcontroller that I need to be able to compile with the vendor's Eclipse environment (that generates GNU make makefiles) and a CMake environment. Both use the same gcc-based toolchain binaries (as of now based on GCC 12.3.1). I am using elf_diff to ensure that the resulting binaries are equal "enough". This worked fine so far with different configurations (e.g., different linker scripts, applications).
Eventually, I ran into a false positive (reporting the files to differ although they not AFAICT) that also shows a wrong location to the respective symbols for one of the ELF files.
To Reproduce
It's not exactly easy to provide a MWE including the sources (and assuming this is ARM-specific you would need the right toolchain too). The main culprit is this function. I could provide you the ELF files though (I'd rather do that in private though because this is work-related and contains the customer and project name in the paths :).
I execute elf_diff with --skip_symbol_similarities
, --bin_dir
pointing to the ARM toolchain used for building, and --bin_prefix "arm-none-eabi-"
.
I found out some interesting and hopefully at least partially helpful facts:
- Both ELF files work fine in practice as far as execution is concerned.
- The order of the object files during linking is important. I can make the false positive go away by swapping two files around in the linker's command line(!).
- Neither of the two object files involved in the swapping contain the symbols reported in the false positive.
- The respective function is an interrupt handler function that is defined with
__attribute__ ((weak, section(".after_vectors")))
and has a declared prototype with__attribute__ ((weak))
only. This function is then aliased to 134 other function names with__attribute__ ((weak, alias (...)))
. - The multipage html output correctly lists all of the function names at the same line where the actual definition is located for one of the ELF files and consistently on a wrong line and wrong file for the other ELF file.
- Dumping the debug info (with
arm-none-eabi-readelf -w
) shows a lot of warnings for both sides including numerousreadelf: Warning: There is a hole [... - ...] in .debug_loc section.
and exactly 10 occurrences ofreadelf: Warning: Hole and overlap detection requires adjacent view lists and loclists.
each. (I don't know why 10 times yet. There are 22 object files involved). - This happens only for one linker script configuration where the function in question is mapped to a physical address near 0 (namely to
0x000002ee
).
I couldn't find out what the wrong side is actually pointing to. As I mentioned the file pointed to does not contain any of the affected symbols at all. And the line number is also different but I could not determine where it is coming from. From all of the above, I think this is either a bug in ld
or the ELF parsing (or both) that is triggered by some peculiar debug info of aliased functions.
Expected behavior I am not sure exactly. Ideally, the expected behavior should be that it just works and shows the files to be equal. Alternatively, it could probably also try to detect the erroneous circumstance and report this as an error.
Screenshots
Desktop (please complete the following information):
- OS: Debian stable
- Version 0.7.1 from pip