dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Fix and enable branch minimization in backend

Open limepoutine opened this issue 1 month ago • 5 comments

There is an incomplete jump minimization pass in the backend, that is never used (since the first SVN revision of D2). This PR modernizes and enables the pass. The algorithm is still quadratic, but pathological case is somewhat rare because of restrictions from the glue layer. I took some care not to reorder if-chains to avoid disrupting branch prediction.

Generally, jump minimization reduces frontend stall and provides 5% improvement for branchy code. There is 2% improvement for Jabba Laci's AoC program and very tiny improvement for code in this SO thread (compiled with -inline -O -release, tested with hyperfine -w 3 -r 15, on an Intel i5-12500H with atom cores disabled):

Benchmark 1: ./aoc_brmin
  Time (mean ± σ):      3.548 s ±  0.007 s    [User: 3.541 s, System: 0.008 s]
  Range (min … max):    3.538 s …  3.559 s    15 runs

Benchmark 2: ./aoc_master
  Time (mean ± σ):      3.619 s ±  0.007 s    [User: 3.610 s, System: 0.010 s]
  Range (min … max):    3.606 s …  3.632 s    15 runs

Summary
  ./aoc_brmin ran
    1.02 ± 0.00 times faster than ./aoc_master

Benchmark 1: ./stackoverflow_brmin
  Time (mean ± σ):      1.037 s ±  0.010 s    [User: 1.036 s, System: 0.001 s]
  Range (min … max):    1.022 s …  1.058 s    15 runs

Benchmark 2: ./stackoverflow_master
  Time (mean ± σ):      1.044 s ±  0.005 s    [User: 1.043 s, System: 0.001 s]
  Range (min … max):    1.037 s …  1.055 s    15 runs

Summary
  ./stackoverflow_brmin ran
    1.01 ± 0.01 times faster than ./stackoverflow_master

Let's see if it breaks anything.

limepoutine avatar Nov 29 '25 09:11 limepoutine

I don't understand quite what this does, but I like the results! (Why is it still in progress?)

WalterBright avatar Dec 01 '25 21:12 WalterBright

Hey, you wrote the original code, I expected there to be some story around this.

It rearranges basic blocks to reduce jumps. For example, a catch block in the middle of a function can be reordered to after the epilogue, so the locality is better. It still has some issues with loops in the current form.

while (true)
{
    if (f())
    {
        g();
        break;
    }
    else
        h();
}

static foreach (i; 0 .. 64) a[i] = i;
return;

The else block will be moved to the end of the function despite the long distance, which gives atrocious locality to the loop.

Lstart:
while (true)
{
    if (f())
    {
        g();
        break;
    }
    else
        goto Lend;
}

static foreach (i; 0 .. 64) a[i] = i;
return;

Lend:
h();
goto Lstart;

Statically it's impossible to determine the hot branch at runtime, so such motion is unacceptable. I'm considering limiting the distance of motion, or just inhibiting motion inside a loop to fix this. I wonder if there are better ways to do this.

limepoutine avatar Dec 02 '25 09:12 limepoutine

We did this for the assert failure branches sometime ago of moving them past the function epilogue.

thewilsonator avatar Dec 02 '25 10:12 thewilsonator

Thanks for your pull request and interest in making D better, @limepoutine! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#22158"

dlang-bot avatar Dec 07 '25 11:12 dlang-bot

Added a completely nonsensical cost model to prevent pathological cases. Anyone has better ideas?

limepoutine avatar Dec 07 '25 11:12 limepoutine