mold icon indicating copy to clipboard operation
mold copied to clipboard

Deduplicate debug records

Open rui314 opened this issue 3 years ago • 4 comments

Traditionally, Unix linkers treat debug info sections as opaque blobs of bytes. The advantage of this approach is its simplicity; linkers don't have to know anything about the debug info format and can link debug info sections just by applying relocations. The disadvantage is the size of output debug info sections. Output debug sections contains lots of duplicate type records and debug info for dead sections (debug info for comdats).

With --compress-debug-sections, we can compress debug sections using zlib (or zstd), but that's a hack for the size problem.

This issue is kind of an elephant in the room. It is just accepted as if it's unavoidable, but technically the linker can deduplicate debug info records by parsing and reconstructing debug info.

I don't know how hard it is to parse and reconstruct debug info though. DWARF is pretty complicated; ironically, a major part of its complexity comes from its variable-sized record format which prevented us from compacting DWARF info in an output file.

rui314 avatar Oct 23 '22 03:10 rui314

For C++, I believe debug fission already supports COMDAT elimination of debug types. This does not apply to C which does not have ODR, however.

A similar endeavor is Linux/eBPF's BTF. It's a stripped down version of DWARF, and it's doing something very close to ICF for deduplication. https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

ishitatsuyuki avatar Oct 23 '22 07:10 ishitatsuyuki

I think DWZ^1 is also trying to de-duplicate DWARF entries, typically in between different DSOs.

marxin avatar Oct 25 '22 06:10 marxin

Can you run DWZ? It doesn't seem to understand modern DWARF records.

$ dwz -o x mold
dwz: mold: Unknown debugging section .debug_addr

rui314 avatar Oct 25 '22 07:10 rui314

Sure. We use it by default for every package in openSUSE distribution.

The issue you see is likely related to -gsplit-dwarf, right ^1? What compiler do you use for the building of mold.

For me it works:

$ dwz --version
dwz version 0.14
...
$ dwz /tmp/mold
$ bloaty /tmp/mold -- /tmp/mold.before
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  -0.0%      -1  [ = ]       0    [Unmapped]
 -53.7% -1023Ki  [ = ]       0    .debug_abbrev
 -39.3% -92.4Mi  [ = ]       0    .debug_info
 -22.0% -93.4Mi  [ = ]       0    TOTAL

marxin avatar Oct 25 '22 10:10 marxin