jdk 8342540: InterfaceCalls micro-benchmark gives misleading results

trafficstars

InterfaceCalls.java makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable.

Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark.

Benchmark                            (randomized)  Mode  Cnt   Score   Error  Units
InterfaceCalls.test2ndInt3Types             false  avgt    4   5.013 ± 0.081  ns/op
InterfaceCalls.test2ndInt3Types              true  avgt    4  23.421 ± 0.102  ns/op

This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls.

Progress

[x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8342540: InterfaceCalls micro-benchmark gives misleading results (Enhancement - P4)

Reviewers

Aleksey Shipilev (@shipilev - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21581/head:pull/21581
$ git checkout pull/21581

Update a local copy of the PR:
$ git checkout pull/21581
$ git pull https://git.openjdk.org/jdk.git pull/21581/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21581

View PR using the GUI difftool:
$ git pr show -t 21581

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21581.diff

Webrev

Link to Webrev Comment

Oct 18 '24 11:10 theRealAph

:wave: Welcome back aph! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Oct 18 '24 11:10 bridgekeeper[bot]

@theRealAph This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342540: InterfaceCalls micro-benchmark gives misleading results

Reviewed-by: shade, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 221 new commits pushed to the master branch:

9cfb0f7f7ad31081c917be1eb0e39e2552e45382: 8341527: AVX-512 intrinsic for SHA3
4ce19ca110b6e1eeed7483a1ec7c75fbc1d1b773: 8343190: GHA: Try building JTReg several times
7c800e6bae388dd87986f366787398fe99b4e2ee: 8343026: JFR: Index into fields in the topFrame
d8b3685d36873904248e9701f66459e074a4a8ab: 8342607: Enhance register printing on x86_64 platforms
d8430efb5e159b8e08d2cac66b46cb4ff1112927: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM
6332e258f91789cf50d07a6929f32ff3aaef1a92: 8343183: [s390x]: Problemlist runtime/Monitor/SyncOnValueBasedClassTest.java Failure
79a07ad726f4e4b0502a22a55832960aa1561911: 8343149: Cleanup os::print_tos_pc on AIX
beff8bfe2a5334823b67cb748bc8652dc6a3f3d4: 8342823: Ubsan: ciEnv.cpp:1614:65: runtime error: member call on null pointer of type 'struct CompileTask'
e389f82b1b2365a43fef744936b222328d71494b: 8343137: C2: VerifyLoopOptimizations fails with "Was reachable in only one"
0abfa3ba8f72538f62be838c1ebac8cfbdd14cdf: 8304824: NMT should not use ThreadCritical
... and 211 more: https://git.openjdk.org/jdk/compare/1581508988141bfb420d97759138203f30926b35...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Oct 18 '24 11:10 openjdk[bot]

@theRealAph The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Oct 18 '24 11:10 openjdk[bot]

Webrevs

01: Full - Incremental (52f80b28)
00: Full (c3fa63c5)

Oct 18 '24 11:10 mlbridge[bot]

Is there even a point to do non-randomized test then?

That's a very interesting question, and one that has been occupying me since I discovered this problem a few days ago.

I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one.

Oct 18 '24 16:10 theRealAph

I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one.

Yeah, all right. We can have a test that brackets the real-world performance between "best" and "worst" cases.

If you want to make "best" and "worst" only differ in the actual payload, maybe we should compute both scrambled and non-scrambled indexes, feed both to Blackhole, and only then select the index based on randomized. This way we would always compute both indexes.

Also stats question: Do you know if scrambling actually produces the full period in requested range, and whether the frequency for individual cases is roughly the same? We kinda assume this xor-step is well distributed and has good enthropy in lower bits, but is it in practice?

Oct 21 '24 09:10 shipilev

I have no idea how often well-predicted megamorphic calls occur. I can speculate that the "typical" megamorphic case lies somewhere between these extremes, but that is all. It may well be that normal behaviour is chaotic, but I strongly suspect that the cases are unlikely to be equally probable, as they are here. So, it is possible that an utterly unpredictable access pattern is just as unrealistic as a perfectly predictable one.

Yeah, all right. We can have a test that brackets the real-world performance between "best" and "worst" cases.

If you want to make "best" and "worst" only differ in the actual payload, maybe we should compute both scrambled and non-scrambled indexes, feed both to Blackhole, and only then select the index based on randomized. This way we would always compute both indexes.

Sure, I can try that, but I doubt it'll affect much. The great virtue of the xorshift generator is to be so lightweight, just a few clocks, that it barely affects anything.

Also stats question: Do you know if scrambling actually produces the full period in requested range, and whether the frequency for individual cases is roughly the same? We kinda assume this xor-step is well distributed and has good enthropy in lower bits, but is it in practice?

I don't think we assume it: the analysis is in Marsaglia's Xorshift RNGs paper. The full period (except 0) is proven there.

The generator I've used here is a compromise: it has to be just good enough to throw off a branch predictor, while still being of very low overhead. There are several elaborations of Xorshift with better statistical properties (see "xoroshiro*") but they are less successful at just getting out of the way.

Oct 21 '24 11:10 theRealAph

I don't think we assume it: the analysis is in Marsaglia's Xorshift RNGs paper. The full period (except 0) is proven there.

Yeah, all right. Seeing is believing, and indeed the distributions look okay:

public class XorShiftHisto {

    static int l;

    static int step(int range) {
        l = scramble(l);
        return (l & Integer.MAX_VALUE) % range;
    }

    static int scramble(int n) {
        int x = n;
        x ^= x << 13;
        x ^= x >>> 17;
        x ^= x << 5;
        return x == 0 ? 1 : x;
    }

    public static void main(String... args) {
        for (int range = 1; range <= 10; range++) {
            l = 0;
            int[] histo = new int[range];
            for (int c = 0; c < 1000000; c++) {
                histo[step(range)]++;
            }
            System.out.println("Histo " + range + ": " + Arrays.toString(histo));
        }
    }
}

$ java XorShiftHisto.java
Histo 1: [1000000]
Histo 2: [500093, 499907]
Histo 3: [333036, 333888, 333076]
Histo 4: [250116, 249824, 249977, 250083]
Histo 5: [200577, 199994, 199940, 199945, 199544]
Histo 6: [166273, 166493, 166425, 166763, 167395, 166651]
Histo 7: [142746, 142358, 142815, 143556, 143004, 142908, 142613]
Histo 8: [124841, 124510, 125431, 124673, 125275, 125314, 124546, 125410]
Histo 9: [110846, 111088, 110885, 110834, 111145, 111201, 111356, 111655, 110990]
Histo 10: [100197, 100061, 99901, 99953, 100070, 100380, 99933, 100039, 99992, 99474]

Oct 21 '24 13:10 shipilev

/integrate

Nov 06 '24 17:11 theRealAph

Going to push as commit 78b378ad03d0f6c85468ac208e84fabea79fc7de. Since your change was applied there have been 341 commits pushed to the master branch:

c0e6c3b93c0d21debc538e0135805c2957053108: 8343214: Fix encoding errors in APX New Data Destination Instructions Support
0be7118b2f761b416ebf8cbb11473d51e80be409: 8279016: JFR Leak Profiler is broken with Shenandoah
6811a11e278118b8b2781f1eaf45d363a3d2db49: 8341408: Implement JEP 488: Primitive Types in Patterns, instanceof, and switch (Second Preview)
72a45ddbad9c343200197348ccfcf74105e6fefa: 8341834: C2 compilation fails with "bad AD file" due to Replicate
57c3bb6091f8ba0caced6f5ecf21dc998ffeee9f: 8343068: C2: CastX2P Ideal transformation not always applied
83f3d42d6bcefac80449987f4d951f8280eeee3a: 8339303: C2: dead node after failing to match cloned address expression
ead0116f2624e0e34529e47e4f509142d588b994: 8331341: secondary_super_cache does not scale well: C1 and interpreter
06d8216a4ef6b883119459da7e52b37d16cd2f03: 8318442: java/net/httpclient/ManyRequests2.java fails intermittently on Linux
bdd68163df4d9b63694bfc0900e4b5ddb2475834: 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
4431852a880b06241231d346311170331c20ab2d: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor
... and 331 more: https://git.openjdk.org/jdk/compare/1581508988141bfb420d97759138203f30926b35...master

Your commit was automatically rebased without conflicts.

Nov 06 '24 17:11 openjdk[bot]

@theRealAph Pushed as commit 78b378ad03d0f6c85468ac208e84fabea79fc7de.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Nov 06 '24 17:11 openjdk[bot]

jdk jdk copied to clipboard

8342540: InterfaceCalls micro-benchmark gives misleading results

Progress

Issue

Reviewers

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard