mold icon indicating copy to clipboard operation
mold copied to clipboard

Profile-guided dead code elimination

Open rui314 opened this issue 2 years ago • 13 comments
trafficstars

This is another crazy optimization idea that the traditional linker doesn't provide.

Applications tend to contain lots of functions that will never be executed at runtime, though they look reachable by the static analysis. Reachability-based garbage collection cannot eliminate such functions from the output.

But, what if the test code coverage is 100%? Then, any functions that weren't executed during testing will (by definition) never be executed in production, so they are safe to be removed from the output. We could replace them with a small stub function that just says "unreachable code is executed" and exit.

This optimization has a potential to eliminate lots of code from an application.

This is also an unsafe optimization, as it is hard to achieve 100% test coverage. I believe there are a few ways to mitigate the risk:

  • Use fuzzing to run code that is executed only for error condition
  • Use code profiling data from the previous production releases to find and keep functions that runs only in production

Application size matters even in today's computing environment. For example, I heard that application installation success rate directly correlates to its binary size on mobile. So any idea that can significantly reduce the binary size has a big potential.

Edit: The other idea to be on the safe side is to apply this optimization only to statically-linked libraries. The rationale is that users don't write dead code for their applications, but the libraries the app is linked to tend to bring in lots of unnecessary code.

rui314 avatar Jan 12 '23 04:01 rui314

What are the advantages of making this optimization at link vs compile stage?

yerke avatar Jan 12 '23 04:01 yerke

If you do it at the compile stage, you need to rebuild all object files for a particular configuration, which isn't always easy. For example, if the dynamic analysis tool find that some function in libz.a is not used in production, you don't generally want to libz.a, I guess.

rui314 avatar Jan 12 '23 05:01 rui314

This is also an unsafe optimization, as it is hard to achieve 100% test coverage.

What exactly would you measure for using that 100% as a target? All statements in the source language have a semantic effect?

mbj avatar Jan 12 '23 14:01 mbj

If 100% of statements in the source language are reached, then 100% of code is live, there's nothing dead to optimize out.

If changed to "every function has either 0% or 100% coverage of source language statements", your definition means something, but you can get false negatives from arrays of function pointers where only some indices are called. And it's difficult to get 100% coverage of assert() calls unless you include a bunch of intentionally-buggy callers.

Weakening it to "functions that are not referenced by anything can be deleted" would be safe, but it also exists already; the linker won't extract unused object files from a .a, and if you want greater granularity, you can use -ffunction-sections -Wl,--gc-sections. And I'm not sure if linker can remove unused functions from a .o without -ffunction-sections.

Certainly an interesting idea, though.

Alcaro avatar Jan 12 '23 15:01 Alcaro

Our experience in the WebAssembly space may be relevant here. We've investigated this type of idea because wasm binary size is very important on the web.

Our current solution is wasm-split (credit: @tlively) which is also integrated in the Emscripten toolchain. Basically, users run their application in a "profiling" mode that notes which code is reached. We then split the binary into two parts: the first that is loaded immediately, and the second which contains code we think is unused, and is only downloaded and run on demand.

We chose that design because in practice it is too risky to fail entirely if the code we thought was unused ends up used. At least, that is the case for the large and complex applications we are focused on. That is, we ended up not trusting code coverage completely, but there are still substantial benefits to splitting out the unused code: the main application starts up more quickly, and if the second module is never used, we save memory.

I don't know enough about your goals here to know if such splitting in mold could make sense. In principle, though, you can dlopen the second module on demand, or something like that. That would not save binary size on disk, but it would save it in memory, at least.

kripken avatar Jan 13 '23 00:01 kripken

Separating unreachable code as a separate .so is definitely doable, and that's probably one should do to implement this feature.

The other crazy, more wild idea is to just ignore errors, which is often referred to as "failure oblivious computing". With this approach, we'd replace an unreachable function with a function that just returns without reporting an error or doing anything to recover from the error. That's not a correct way to fix the problem, but it at least keeps the program going. It's found that surprisingly large number of errors can be ignored for some type of errors [1].

[1] https://www.usenix.org/legacy/event/osdi04/tech/full_papers/rinard/rinard.pdf

rui314 avatar Jan 13 '23 00:01 rui314