mold icon indicating copy to clipboard operation
mold copied to clipboard

Add the extra statistics of relative relocations in large binaries

Open PetrShumilov opened this issue 1 year ago • 4 comments

Motivation

I am using the mold linker to quickly link a large monolithic application (over 20 GB with debug information). The primary challenge with my binary is the constantly growing (and sometimes uncontrolled) portion of the business logic. Furthermore, the structure of the application's dependencies is highly heterogeneous, and I lack the ability to control how they were compiled — whether with PIE or without, and whether with -mcmodel=large or not. This leads to unpredictable issues during linking; for example, certain relocations (e.g., PC-relative) cannot be resolved because sections containing business logic have become too large (e.g., R_X86_64_32S allows for offsets less than ±2GB).

Using the -mcmodel=large and producing only absolute relocations for all components of the binary is not feasible in my case. Therefore, I need a method to detect the relocations nearest to overflowing. Based on your design principles of determinism and build reproducibility, I can rely on the fact that the resulting binary structure will not change significantly from one build to another.

Solution

For each architecture, there are apply_reloc_alloc and apply_reloc_nonalloc methods in InputSection where the check routine verifies the relocation range depending on the relocation type. We can update the check routine to record the minimum distance to the upper and lower bounds of the range for the current section. After processing all relocation entries of a section, we can update the global minimums in the context.

As a result, we will obtain two new metrics: relative_relocations_offset_infimum and relative_relocations_offset_supremum, which can be interpreted as indicators of "how much space is still available" in the large binary. Although these aren't universal indicators, they may be extremely helpful for managing large monolithic applications.

Impact

  • The performance impact is minimal, as statistics collection is performed only when the --stats option is enabled.
  • Memory overhead is minimized (as much as possible for this case).
  • Improvements for statistics collection are integrated into the current architecture without significant modifications.
  • The inclusion of new helpful metrics, which can be valuable for large applications.

PetrShumilov avatar Dec 03 '24 15:12 PetrShumilov

I see the need to do something like this, but I'd like to learn a little bit more about the background.

  • How do you fix a potential issue if you find that a relocation is close to overflow, given that you don't have control over how object files are compiled?

  • How do you manage to build such an extremely large executable without relocation overflow? Do you carefully order object files on the command line so that relocations wouldn't overflow?

  • How large are your text and data segments? How large is your executable after stripping debug info?

  • Do relocations referring to GOT/PLT overflow?

Also, please share some performance numbers of the linker if you have. I'm just curious how much useful mold is compared to other linkers for users like you.

rui314 avatar Dec 04 '24 00:12 rui314

@rui314 Sure,

How do you fix a potential issue if you find that a relocation is close to overflow, given that you don't have control over how object files are compiled?

I use several techniques. For example, based on .map file analysis, I decide which parts of the business logic can be recompiled with -mcmodel=large. As you know, mold does not fully support linker scripts (I agree with your position on linker scripts), but I rely on the ability to reorder object files via the command line. To avoid issues with debug sections, I use 64-bit DWARF. Other useful practices include incremental linking, reducing the number of relations between text and data sections (e.g., by playing with inlining), and experimenting with -Os optimizations.

How do you manage to build such an extremely large executable without relocation overflow? Do you carefully order object files on the command line so that relocations wouldn't overflow?

I know the structure of my binary. It requires to use some tricks and non-regular solutions like I mentioned earlier, including reordering.

How large are your text and data segments? How large is your executable after stripping debug info?

This depends on various factors. Typically, the text section size ranges from 1.9 to 2.6 GB. Unfortunately, I cannot disclose the size of the data segment or the size of the binary after stripping debug info. Of course, the debug is the major part.

Do relocations referring to GOT/PLT overflow?

I try to avoid dynamically loaded dependencies. Implementations of standard libraries for C++ and C, runtime libraries, and compiler-dependent routines can be linked statically. This is not a regular practice and can be error-prone in general cases.

Also, please share some performance numbers of the linker if you have. I'm just curious how much useful mold is compared to other linkers for users like you.

Mold is incredibly fast. I can only compare it to GNU ld 2.31.1. I took a recent random build and measured it using time. Host machine has 128 cores.

GNU ld 2.31.1

real    8m55.300s
user    8m25.546s
sys     0m29.708s

mold (latest)

real    0m17.120s
user    0m0.018s
sys     0m0.019s

PetrShumilov avatar Dec 04 '24 13:12 PetrShumilov

Here is my understanding:

You have some control over how object files are compiled, but you don't want to compile everything with -mcmodel=large. Currently, if a build breaks due to relocation overflow, you recompile certain object files with different command line options or rearrange them on the linker command line to resolve the issue. However, this process is hostile for build maintainers. Instead of detecting relocation overflows only after they occur, you want identify potential overflows in advance so that you can fix them before the build breaks. Thus this feature.

Is my understanding correct?

I was contacted by another big-tech company regarding resolving the same issue. I want to create a feature that works for everybody, so please hold off on approving this PR. For now, please maintain this as your local private patch.

rui314 avatar Dec 05 '24 02:12 rui314

@rui314 Yes, your understanding is absolutely correct. Thanks!

PetrShumilov avatar Dec 05 '24 10:12 PetrShumilov