jdk icon indicating copy to clipboard operation
jdk copied to clipboard

Optimize RandomGenerator::nextBytes

Open Glavo opened this issue 1 year ago • 45 comments


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [ ] Commit message must refer to an issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14638/head:pull/14638
$ git checkout pull/14638

Update a local copy of the PR:
$ git checkout pull/14638
$ git pull https://git.openjdk.org/jdk.git pull/14638/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14638

View PR using the GUI difftool:
$ git pr show -t 14638

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14638.diff

Glavo avatar Jun 24 '23 18:06 Glavo

:wave: Welcome back Glavo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Jun 24 '23 18:06 bridgekeeper[bot]

@Glavo The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Jun 24 '23 18:06 openjdk[bot]

Here are the results for the RandomGeneratorNextBytes benchmark (Here will be continuously updated to show the latest results):

                                                                                          (Baseline)                              (This PR)
Benchmark                                           (algo)  (length)   Mode  Cnt       Score       Error   Units            Score       Error   Units
RandomGeneratorNextBytes.testNextBytes              Random         1  thrpt    5  292124.677 ±  6377.255  ops/ms       346221.250 ± 86860.488  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         2  thrpt    5  261235.014 ± 15707.040  ops/ms       323470.739 ± 16084.063  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         4  thrpt    5  240194.023 ±  4417.534  ops/ms       286154.793 ±  2162.091  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random         8  thrpt    5  120707.831 ±  5701.440  ops/ms       156008.005 ±   128.043  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        16  thrpt    5   63594.497 ±   438.139  ops/ms        78236.080 ±    15.013  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        32  thrpt    5   35420.287 ±   427.508  ops/ms        39262.435 ±    18.943  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random        64  thrpt    5   17651.831 ±    25.639  ops/ms        19688.311 ±    19.507  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random       128  thrpt    5    8554.908 ±    19.695  ops/ms         9887.630 ±     6.683  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random       256  thrpt    5    4560.283 ±    27.455  ops/ms         4874.348 ±     3.856  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random      1024  thrpt    5    1161.771 ±     2.053  ops/ms         1242.620 ±     0.311  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random      4096  thrpt    5     294.610 ±     0.764  ops/ms          309.557 ±     0.131  ops/ms
RandomGeneratorNextBytes.testNextBytes              Random     16384  thrpt    5      73.885 ±     0.055  ops/ms           77.973 ±     0.038  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         1  thrpt    5  214239.266 ±  1103.018  ops/ms       215641.075 ±  1901.826  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         2  thrpt    5  199700.840 ±   465.203  ops/ms       201313.181 ±  1069.213  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         4  thrpt    5  184605.447 ±  1057.641  ops/ms       184081.550 ±  1068.982  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom         8  thrpt    5  144195.042 ±  2155.839  ops/ms       166970.270 ±    62.509  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        16  thrpt    5   92010.333 ±   272.006  ops/ms        90731.669 ±  1179.712  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        32  thrpt    5   45378.019 ±   487.964  ops/ms        54470.769 ±   789.986  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom        64  thrpt    5   24958.803 ±    57.066  ops/ms        29271.323 ±    62.528  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom       128  thrpt    5   12967.609 ±    30.151  ops/ms        15460.181 ±    50.493  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom       256  thrpt    5    6620.502 ±     8.294  ops/ms         7974.591 ±    20.440  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom      1024  thrpt    5    1670.174 ±    14.304  ops/ms         2391.758 ±     1.891  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom      4096  thrpt    5     415.035 ±     0.771  ops/ms          609.107 ±     0.279  ops/ms
RandomGeneratorNextBytes.testNextBytes     L32X64MixRandom     16384  thrpt    5     103.704 ±     0.013  ops/ms          152.771 ±     0.270  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         1  thrpt    5  378919.462 ± 20733.749  ops/ms       382509.180 ±   418.348  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         2  thrpt    5  352209.019 ±   340.381  ops/ms       346027.427 ±  2979.327  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         4  thrpt    5  327951.428 ±   172.418  ops/ms       327855.763 ±   280.082  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus         8  thrpt    5  269875.472 ±    48.783  ops/ms       229580.541 ±    24.469  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        16  thrpt    5  157786.908 ±   363.565  ops/ms       183664.801 ±    19.788  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        32  thrpt    5   85927.731 ±  1988.607  ops/ms       135010.073 ±    12.742  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus        64  thrpt    5   45121.367 ±   113.888  ops/ms        90891.031 ±    51.981  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus       128  thrpt    5   23266.361 ±    83.143  ops/ms        52998.113 ±   527.246  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus       256  thrpt    5   10845.534 ±    23.174  ops/ms        29423.939 ±    10.840  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus      1024  thrpt    5    2724.955 ±     1.782  ops/ms         7910.042 ±   175.002  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus      4096  thrpt    5     744.280 ±     0.064  ops/ms         2064.625 ±     0.646  ops/ms
RandomGeneratorNextBytes.testNextBytes  Xoshiro256PlusPlus     16384  thrpt    5     186.613 ±     0.012  ops/ms          573.580 ±     5.850  ops/ms

This PR significantly improves performance for the default implementation in RandomGenerator.

For the Xoshiro256** algorithm, when the target array is large, the performance of this PR is 3.07 times that of the original.

Glavo avatar Jun 24 '23 18:06 Glavo

The only confirmed performance degradation (<5%) is when the byte array is empty.

For byte arrays with a length greater than 4 (or 8 for RandomGenerator), we often see a performance improvement of 10% to 30%.

Glavo avatar Jun 24 '23 19:06 Glavo

You should probably update the 2 existing benchmarks in https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomNext.java and https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java, or include your benchmark there.

In intellij idea, you can add the micro directory as a module and add jmh maven library and the jdk modules as compile-time dependencies, so intellij can help working on the benchmarks.

liach avatar Jun 25 '23 00:06 liach

Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?

liach avatar Jun 25 '23 04:06 liach

Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?

Due to the overhead of using non aligned reads and checking indexes, ByteArrayLittleEndian is slower than directly calling getLong.

I am running benchmarks based on ByteArrayLittleEndian. The currently benchmark result is that Unsafe.getLong is 13.76% faster than ByteArrayLittleEndian for L32X64MixRandom (bytes.length = 8).

Glavo avatar Jun 25 '23 04:06 Glavo

Due to the overhead of using non aligned reads and checking indexes, ByteArrayLittleEndian is slower than directly calling getLong.

Well, it seems that this is not always correct.

For L32X64MixRandom, when the bytes length is greater than 1024, using ByteArrayLittleEndian is actually 10% faster than getLong. I don't understand why this is happening.

Glavo avatar Jun 25 '23 05:06 Glavo

I think putLongUnaligned tries to put aligned if it can, don't know how C1 or C2 handles it. I think it's a win as long as either Unsafe or VarHandle is faster than the existing manual loop (which could have already been vectorized by C2)

liach avatar Jun 25 '23 05:06 liach

Benchmarking results based on current ByteArrayLittleEndian(VarHandle):

Results ``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1519005.337 ± 10166.724 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215438.181 ± 1296.270 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 203155.966 ± 1102.743 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190993.049 ± 1583.488 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184699.083 ± 1656.026 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164362.211 ± 1688.353 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 156946.704 ± 1188.623 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153627.148 ± 2754.413 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 164011.508 ± 87.110 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101824.800 ± 183.479 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98005.608 ± 188.852 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95530.799 ± 109.554 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114617.995 ± 51.252 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54787.870 ± 36.547 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29267.303 ± 17.143 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15590.939 ± 5.373 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8001.160 ± 2.425 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.970 ± 1.097 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2390.227 ± 0.317 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.989 ± 0.190 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.188 ± 0.051 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 302.962 ± 1.783 ops/ms RandomBenchmark.Random 0 thrpt 5 1511686.595 ± 64669.779 ops/ms RandomBenchmark.Random 1 thrpt 5 355958.380 ± 33275.649 ops/ms RandomBenchmark.Random 2 thrpt 5 322566.151 ± 2151.769 ops/ms RandomBenchmark.Random 3 thrpt 5 291901.421 ± 3873.578 ops/ms RandomBenchmark.Random 4 thrpt 5 270129.002 ± 19.117 ops/ms RandomBenchmark.Random 5 thrpt 5 135856.891 ± 566.745 ops/ms RandomBenchmark.Random 6 thrpt 5 130272.051 ± 61.738 ops/ms RandomBenchmark.Random 7 thrpt 5 123843.896 ± 107.200 ops/ms RandomBenchmark.Random 8 thrpt 5 159297.447 ± 77.475 ops/ms RandomBenchmark.Random 10 thrpt 5 97626.041 ± 420.827 ops/ms RandomBenchmark.Random 12 thrpt 5 104838.370 ± 52.721 ops/ms RandomBenchmark.Random 14 thrpt 5 75077.145 ± 142.321 ops/ms RandomBenchmark.Random 16 thrpt 5 78217.212 ± 17.730 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.349 ± 5.522 ops/ms RandomBenchmark.Random 64 thrpt 5 19673.761 ± 18.463 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.985 ± 1.844 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.253 ± 0.684 ops/ms RandomBenchmark.Random 512 thrpt 5 2431.380 ± 1.006 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.599 ± 0.204 ops/ms RandomBenchmark.Random 2048 thrpt 5 618.926 ± 0.181 ops/ms RandomBenchmark.Random 4096 thrpt 5 272.700 ± 1.009 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.693 ± 0.117 ops/ms ```

Glavo avatar Jun 25 '23 05:06 Glavo

I looked into that a few months ago too but didn't come around to actually create a PR mainly for the following reasons (besides lack of time):

  1. I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained
  2. I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

I think 1. should definitely be addressed. There is java/util/Random/NextBytes.java with a very basic test, but it only covers Random and I think a proper test should put the implementation note directly in code.

SirYwell avatar Jun 25 '23 06:06 SirYwell

I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

Personally, I often use it to generate some data for unit testing, so improving its performance would be helpful to me.

I think 1. should definitely be addressed. There is java/util/Random/NextBytes.java with a very basic test, but it only covers Random and I think a proper test should put the implementation note directly in code.

I agree.

Glavo avatar Jun 25 '23 06:06 Glavo

Looking into the baseline results:

RandomBenchmark.L32X64MixRandom        14  thrpt    5    88666.991 ±   247.778  ops/ms
RandomBenchmark.L32X64MixRandom        16  thrpt    5    94277.271 ±   661.097  ops/ms  <-- significantly higher than 14
RandomBenchmark.Random                  6  thrpt    5   121245.951 ±  1767.579  ops/ms
RandomBenchmark.Random                  7  thrpt    5   124512.260 ±  2239.107  ops/ms  <-- higher than 6
RandomBenchmark.Random                  8  thrpt    5   103982.515 ±  2052.329  ops/ms

Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.

liach avatar Jun 25 '23 06:06 liach

Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.

Disassembly (baseline)
============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c2)     155  693       4       java.util.random.RandomGenerator::nextBytes (100 bytes)
 total in heap  [0x00007f09c84c8c10,0x00007f09c84c9708] = 2808
 relocation     [0x00007f09c84c8d68,0x00007f09c84c8da0] = 56
 main code      [0x00007f09c84c8da0,0x00007f09c84c9570] = 2000
 stub code      [0x00007f09c84c9570,0x00007f09c84c9588] = 24
 oops           [0x00007f09c84c9588,0x00007f09c84c9590] = 8
 metadata       [0x00007f09c84c9590,0x00007f09c84c95a0] = 16
 scopes data    [0x00007f09c84c95a0,0x00007f09c84c9660] = 192
 scopes pcs     [0x00007f09c84c9660,0x00007f09c84c96f0] = 144
 dependencies   [0x00007f09c84c96f0,0x00007f09c84c96f8] = 8
 nul chk table  [0x00007f09c84c96f8,0x00007f09c84c9708] = 16

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Entry Point]
  # {method} {0x000000080017fdf8} 'nextBytes' '([B)V' in 'java/util/random/RandomGenerator'
  # this:     rsi:rsi   = 'java/util/random/RandomGenerator'
  # parm0:    rdx:rdx   = '[B'
  #           [sp+0x40]  (sp of caller)
  0x00007f09c84c8da0:   mov    0x8(%rsi),%r10d
  0x00007f09c84c8da4:   movabs $0x800000000,%r11
  0x00007f09c84c8dae:   add    %r11,%r10
  0x00007f09c84c8db1:   cmp    %r10,%rax
  0x00007f09c84c8db4:   jne    0x00007f09c7da3d80           ;   {runtime_call ic_miss_stub}
  0x00007f09c84c8dba:   xchg   %ax,%ax
  0x00007f09c84c8dbc:   nopl   0x0(%rax)
[Verified Entry Point]
  0x00007f09c84c8dc0:   mov    %eax,-0x14000(%rsp)
  0x00007f09c84c8dc7:   push   %rbp
  0x00007f09c84c8dc8:   sub    $0x30,%rsp
  0x00007f09c84c8dcc:   cmpl   $0x1,0x20(%r15)
  0x00007f09c84c8dd4:   jne    0x00007f09c84c9562
  0x00007f09c84c8dda:   vmovq  %rsi,%xmm0
  0x00007f09c84c8ddf:   mov    0xc(%rdx),%r9d               ; implicit exception: dispatches to 0x00007f09c84c9538
  0x00007f09c84c8de3:   mov    %r9d,%ebp
  0x00007f09c84c8de6:   sar    $0x3,%ebp
  0x00007f09c84c8de9:   xor    %edi,%edi
  0x00007f09c84c8deb:   test   %ebp,%ebp
  0x00007f09c84c8ded:   jle    0x00007f09c84c945a
  0x00007f09c84c8df3:   mov    0x8(%rsi),%r10d
  0x00007f09c84c8df7:   lea    -0x1(%rbp),%r8d
  0x00007f09c84c8dfb:   cmp    $0x1021990,%r10d             ;   {metadata('jdk/random/L32X64MixRandom')}
  0x00007f09c84c8e02:   jne    0x00007f09c84c94ec
  0x00007f09c84c8e08:   mov    %rsi,%rax
  0x00007f09c84c8e0b:   mov    0x10(%rax),%r10d
  0x00007f09c84c8e0f:   mov    0x18(%rax),%esi
  0x00007f09c84c8e12:   mov    0x14(%rax),%r11d
  0x00007f09c84c8e16:   mov    0xc(%rax),%ecx
  0x00007f09c84c8e19:   add    $0xfffffffe,%ebp
  0x00007f09c84c8e1c:   movslq %r9d,%r13
  0x00007f09c84c8e1f:   vmovq  %rdx,%xmm3
  0x00007f09c84c8e24:   mov    %r9d,0xc(%rsp)
  0x00007f09c84c8e29:   vmovd  %r8d,%xmm1
  0x00007f09c84c8e2e:   mov    %ecx,(%rsp)
  0x00007f09c84c8e31:   mov    %r13,0x10(%rsp)
  0x00007f09c84c8e36:   xor    %r11d,%esi
  0x00007f09c84c8e39:   lea    (%r11,%r10,1),%ecx
  0x00007f09c84c8e3d:   lea    0x7(%rdi),%r14d
  0x00007f09c84c8e41:   mov    %ecx,%r9d
  0x00007f09c84c8e44:   shr    $0x10,%r9d
  0x00007f09c84c8e48:   xor    %ecx,%r9d
  0x00007f09c84c8e4b:   movslq %r14d,%rdx
  0x00007f09c84c8e4e:   imul   $0xd36d884b,%r9d,%r9d
  0x00007f09c84c8e55:   add    $0xfffffffffffffff9,%rdx
  0x00007f09c84c8e59:   mov    %r9d,%ecx
  0x00007f09c84c8e5c:   shr    $0x10,%ecx
  0x00007f09c84c8e5f:   xor    %r9d,%ecx
  0x00007f09c84c8e62:   imul   $0xadb4a92d,%r10d,%r9d
  0x00007f09c84c8e69:   add    (%rsp),%r9d
  0x00007f09c84c8e6d:   imul   $0xd36d884b,%ecx,%ebx
  0x00007f09c84c8e73:   imul   $0xadb4a92d,%r9d,%r10d
  0x00007f09c84c8e7a:   add    (%rsp),%r10d
  0x00007f09c84c8e7e:   mov    %r10d,0x10(%rax)
  0x00007f09c84c8e82:   mov    %ebx,%r8d
  0x00007f09c84c8e85:   shr    $0x10,%r8d
  0x00007f09c84c8e89:   xor    %ebx,%r8d
  0x00007f09c84c8e8c:   mov    %esi,%ebx
  0x00007f09c84c8e8e:   shl    $0x9,%ebx
  0x00007f09c84c8e91:   movslq %r8d,%rcx
  0x00007f09c84c8e94:   rorx   $0x6,%r11d,%r8d
  0x00007f09c84c8e9a:   xor    %esi,%r8d
  0x00007f09c84c8e9d:   xor    %ebx,%r8d
  0x00007f09c84c8ea0:   add    %r8d,%r9d
  0x00007f09c84c8ea3:   shl    $0x20,%rcx
  0x00007f09c84c8ea7:   mov    %r9d,%ebx
  0x00007f09c84c8eaa:   shr    $0x10,%ebx
  0x00007f09c84c8ead:   xor    %r9d,%ebx
  0x00007f09c84c8eb0:   rorx   $0x6,%r8d,%r11d
  0x00007f09c84c8eb6:   imul   $0xd36d884b,%ebx,%r9d
  0x00007f09c84c8ebd:   rorx   $0x13,%esi,%ebx
  0x00007f09c84c8ec3:   xor    %r8d,%ebx
  0x00007f09c84c8ec6:   xor    %ebx,%r11d
  0x00007f09c84c8ec9:   mov    %r9d,%r8d
  0x00007f09c84c8ecc:   shr    $0x10,%r8d
  0x00007f09c84c8ed0:   xor    %r9d,%r8d
  0x00007f09c84c8ed3:   rorx   $0x13,%ebx,%esi
  0x00007f09c84c8ed9:   mov    %esi,0x18(%rax)
  0x00007f09c84c8edc:   imul   $0xd36d884b,%r8d,%r9d
  0x00007f09c84c8ee3:   shl    $0x9,%ebx
  0x00007f09c84c8ee6:   xor    %ebx,%r11d
  0x00007f09c84c8ee9:   mov    %r11d,0x14(%rax)
  0x00007f09c84c8eed:   mov    %r9d,%ebx
  0x00007f09c84c8ef0:   shr    $0x10,%ebx
  0x00007f09c84c8ef3:   xor    %r9d,%ebx
  0x00007f09c84c8ef6:   movslq %ebx,%r8
  0x00007f09c84c8ef9:   xor    %rcx,%r8                     ;   {no_reloc}
  0x00007f09c84c8efc:   cmp    0x10(%rsp),%rdx
  0x00007f09c84c8f01:   jae    0x00007f09c84c94d4
  0x00007f09c84c8f07:   cmp    0xc(%rsp),%r14d
  0x00007f09c84c8f0c:   jae    0x00007f09c84c94db
  0x00007f09c84c8f12:   mov    0x458(%r15),%rcx
  0x00007f09c84c8f19:   movslq %edi,%rbx
  0x00007f09c84c8f1c:   mov    %r8d,%r9d
  0x00007f09c84c8f1f:   vmovq  %xmm3,%rdx
  0x00007f09c84c8f24:   mov    %r9b,0x10(%rdx,%rdi,1)
  0x00007f09c84c8f29:   shr    $0x8,%r8
  0x00007f09c84c8f2d:   mov    %r8d,%r9d
  0x00007f09c84c8f30:   mov    %r9b,0x11(%rdx,%rbx,1)
  0x00007f09c84c8f35:   inc    %r14d
  0x00007f09c84c8f38:   shr    $0x8,%r8
  0x00007f09c84c8f3c:   mov    %r8,%rdi
  0x00007f09c84c8f3f:   shr    $0x8,%rdi
  0x00007f09c84c8f43:   mov    %r8d,%r9d
  0x00007f09c84c8f46:   mov    %r9b,0x12(%rdx,%rbx,1)
  0x00007f09c84c8f4b:   mov    %edi,%r9d
  0x00007f09c84c8f4e:   mov    %r9b,0x13(%rdx,%rbx,1)
  0x00007f09c84c8f53:   shr    $0x8,%rdi
  0x00007f09c84c8f57:   mov    %rdi,%rdx
  0x00007f09c84c8f5a:   shr    $0x8,%rdx
  0x00007f09c84c8f5e:   mov    %edi,%r8d
  0x00007f09c84c8f61:   vmovq  %xmm3,%r9
  0x00007f09c84c8f66:   mov    %r8b,0x14(%r9,%rbx,1)
  0x00007f09c84c8f6b:   mov    %edx,%r9d
  0x00007f09c84c8f6e:   vmovq  %xmm3,%r8
  0x00007f09c84c8f73:   mov    %r9b,0x15(%r8,%rbx,1)
  0x00007f09c84c8f78:   shr    $0x8,%rdx
  0x00007f09c84c8f7c:   mov    %rdx,%r9
  0x00007f09c84c8f7f:   shr    $0x8,%r9
  0x00007f09c84c8f83:   mov    %edx,%r8d
  0x00007f09c84c8f86:   vmovq  %xmm3,%rdi
  0x00007f09c84c8f8b:   mov    %r8b,0x16(%rdi,%rbx,1)
  0x00007f09c84c8f90:   mov    %r9d,%r9d
  0x00007f09c84c8f93:   mov    %r9b,0x17(%rdi,%rbx,1)       ; ImmutableOopMap {rdi=Oop rax=Oop xmm0=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c8f98:   test   %eax,(%rcx)                  ;   {poll}
  0x00007f09c84c8f9a:   vmovd  %xmm1,%r8d
  0x00007f09c84c8f9f:   dec    %r8d
  0x00007f09c84c8fa2:   cmp    %ebp,%r8d
  0x00007f09c84c8fa5:   jle    0x00007f09c84c8fb4
  0x00007f09c84c8fa7:   vmovd  %r8d,%xmm1
  0x00007f09c84c8fac:   mov    %r14d,%edi
  0x00007f09c84c8faf:   jmp    0x00007f09c84c8e36
  0x00007f09c84c8fb4:   test   %r8d,%r8d
  0x00007f09c84c8fb7:   jle    0x00007f09c84c94e2
  0x00007f09c84c8fbd:   vmovd  %xmm1,%r9d
  0x00007f09c84c8fc2:   dec    %r9d
  0x00007f09c84c8fc5:   vmovd  %r9d,%xmm2
  0x00007f09c84c8fca:   jmp    0x00007f09c84c8fd5
  0x00007f09c84c8fcc:   nopl   0x0(%rax)
  0x00007f09c84c8fd0:   vmovq  %xmm1,%rax
  0x00007f09c84c8fd5:   lea    (%r11,%r10,1),%edi
  0x00007f09c84c8fd9:   xor    %r11d,%esi
  0x00007f09c84c8fdc:   lea    0x7(%r14),%ecx
  0x00007f09c84c8fe0:   rorx   $0x13,%esi,%edx
  0x00007f09c84c8fe6:   movslq %ecx,%r9
  0x00007f09c84c8fe9:   mov    %esi,%ebp
  0x00007f09c84c8feb:   shl    $0x9,%ebp
  0x00007f09c84c8fee:   add    $0xfffffffffffffff9,%r9
  0x00007f09c84c8ff2:   mov    %edi,%r8d
  0x00007f09c84c8ff5:   shr    $0x10,%r8d
  0x00007f09c84c8ff9:   xor    %edi,%r8d
  0x00007f09c84c8ffc:   imul   $0xadb4a92d,%r10d,%edi
  0x00007f09c84c9003:   add    (%rsp),%edi
  0x00007f09c84c9006:   imul   $0xd36d884b,%r8d,%r8d
  0x00007f09c84c900d:   imul   $0xadb4a92d,%edi,%ebx
  0x00007f09c84c9013:   add    (%rsp),%ebx
  0x00007f09c84c9016:   mov    %ebx,0x10(%rax)
  0x00007f09c84c9019:   mov    %r8d,%r10d
  0x00007f09c84c901c:   shr    $0x10,%r10d
  0x00007f09c84c9020:   xor    %r8d,%r10d
  0x00007f09c84c9023:   rorx   $0x6,%r11d,%r11d
  0x00007f09c84c9029:   xor    %esi,%r11d
  0x00007f09c84c902c:   xor    %ebp,%r11d
  0x00007f09c84c902f:   add    %r11d,%edi
  0x00007f09c84c9032:   xor    %r11d,%edx
  0x00007f09c84c9035:   imul   $0xd36d884b,%r10d,%ebp
  0x00007f09c84c903c:   rorx   $0x13,%edx,%r8d
  0x00007f09c84c9042:   mov    %r8d,0x18(%rax)
  0x00007f09c84c9046:   mov    %ebp,%r10d
  0x00007f09c84c9049:   shr    $0x10,%r10d
  0x00007f09c84c904d:   xor    %ebp,%r10d
  0x00007f09c84c9050:   mov    %edx,%r13d
  0x00007f09c84c9053:   shl    $0x9,%r13d
  0x00007f09c84c9057:   movslq %r10d,%rsi
  0x00007f09c84c905a:   mov    %edi,%ebp
  0x00007f09c84c905c:   shr    $0x10,%ebp
  0x00007f09c84c905f:   xor    %edi,%ebp
  0x00007f09c84c9061:   shl    $0x20,%rsi
  0x00007f09c84c9065:   imul   $0xd36d884b,%ebp,%r10d
  0x00007f09c84c906c:   rorx   $0x6,%r11d,%ebp
  0x00007f09c84c9072:   xor    %edx,%ebp
  0x00007f09c84c9074:   xor    %r13d,%ebp
  0x00007f09c84c9077:   mov    %ebp,0x14(%rax)
  0x00007f09c84c907a:   mov    %r10d,%r11d
  0x00007f09c84c907d:   shr    $0x10,%r11d
  0x00007f09c84c9081:   xor    %r10d,%r11d
  0x00007f09c84c9084:   imul   $0xd36d884b,%r11d,%r11d
  0x00007f09c84c908b:   mov    %r11d,%r10d
  0x00007f09c84c908e:   shr    $0x10,%r10d
  0x00007f09c84c9092:   xor    %r11d,%r10d
  0x00007f09c84c9095:   movslq %r10d,%rdx                   ;   {no_reloc}
  0x00007f09c84c9098:   xor    %rsi,%rdx
  0x00007f09c84c909b:   cmp    0x10(%rsp),%r9
  0x00007f09c84c90a0:   jae    0x00007f09c84c9476
  0x00007f09c84c90a6:   cmp    0xc(%rsp),%ecx
  0x00007f09c84c90aa:   jae    0x00007f09c84c9490
  0x00007f09c84c90b0:   lea    (%rbx,%rbp,1),%edi
  0x00007f09c84c90b3:   mov    %ebp,%r10d
  0x00007f09c84c90b6:   xor    %r8d,%r10d
  0x00007f09c84c90b9:   vmovd  %xmm2,%r9d
  0x00007f09c84c90be:   dec    %r9d
  0x00007f09c84c90c1:   rorx   $0x13,%r10d,%r8d
  0x00007f09c84c90c7:   mov    %r10d,%esi
  0x00007f09c84c90ca:   shl    $0x9,%esi
  0x00007f09c84c90cd:   rorx   $0x6,%ebp,%ebp
  0x00007f09c84c90d3:   xor    %r10d,%ebp
  0x00007f09c84c90d6:   xor    %esi,%ebp
  0x00007f09c84c90d8:   xor    %ebp,%r8d
  0x00007f09c84c90db:   mov    %edi,%r11d
  0x00007f09c84c90de:   shr    $0x10,%r11d
  0x00007f09c84c90e2:   xor    %edi,%r11d
  0x00007f09c84c90e5:   rorx   $0x13,%r8d,%esi
  0x00007f09c84c90eb:   mov    %esi,0x18(%rax)
  0x00007f09c84c90ee:   imul   $0xd36d884b,%r11d,%r11d
  0x00007f09c84c90f5:   mov    %r8d,%edi
  0x00007f09c84c90f8:   shl    $0x9,%edi
  0x00007f09c84c90fb:   mov    %r11d,%r10d
  0x00007f09c84c90fe:   shr    $0x10,%r10d
  0x00007f09c84c9102:   xor    %r11d,%r10d
  0x00007f09c84c9105:   rorx   $0x6,%ebp,%r11d
  0x00007f09c84c910b:   xor    %r8d,%r11d
  0x00007f09c84c910e:   xor    %edi,%r11d
  0x00007f09c84c9111:   mov    %r11d,0x14(%rax)
  0x00007f09c84c9115:   imul   $0xd36d884b,%r10d,%edi
  0x00007f09c84c911c:   imul   $0xadb4a92d,%ebx,%r10d
  0x00007f09c84c9123:   add    (%rsp),%r10d
  0x00007f09c84c9127:   lea    (%r10,%rbp,1),%ebx
  0x00007f09c84c912b:   mov    %edi,%r8d
  0x00007f09c84c912e:   shr    $0x10,%r8d
  0x00007f09c84c9132:   xor    %edi,%r8d
  0x00007f09c84c9135:   mov    %ebx,%ebp
  0x00007f09c84c9137:   shr    $0x10,%ebp
  0x00007f09c84c913a:   xor    %ebx,%ebp
  0x00007f09c84c913c:   movslq %r8d,%r8
  0x00007f09c84c913f:   imul   $0xd36d884b,%ebp,%edi
  0x00007f09c84c9145:   shl    $0x20,%r8
  0x00007f09c84c9149:   mov    %edi,%ebp
  0x00007f09c84c914b:   shr    $0x10,%ebp
  0x00007f09c84c914e:   xor    %edi,%ebp
  0x00007f09c84c9150:   imul   $0xadb4a92d,%r10d,%r10d
  0x00007f09c84c9157:   add    (%rsp),%r10d
  0x00007f09c84c915b:   mov    %r10d,0x10(%rax)
  0x00007f09c84c915f:   vmovq  %rax,%xmm1
  0x00007f09c84c9164:   imul   $0xd36d884b,%ebp,%eax
  0x00007f09c84c916a:   mov    %edx,%ebx
  0x00007f09c84c916c:   vmovq  %xmm3,%rdi
  0x00007f09c84c9171:   mov    %bl,0x10(%rdi,%r14,1)
  0x00007f09c84c9176:   mov    %eax,%edi
  0x00007f09c84c9178:   shr    $0x10,%edi
  0x00007f09c84c917b:   xor    %eax,%edi
  0x00007f09c84c917d:   shr    $0x8,%rdx
  0x00007f09c84c9181:   movslq %edi,%rbx
  0x00007f09c84c9184:   xor    %rbx,%r8
  0x00007f09c84c9187:   mov    %edx,%edi
  0x00007f09c84c9189:   shr    $0x8,%rdx
  0x00007f09c84c918d:   movslq %r14d,%rax
  0x00007f09c84c9190:   vmovq  %xmm3,%rbx
  0x00007f09c84c9195:   mov    %dil,0x11(%rbx,%rax,1)       ;   {no_reloc}
  0x00007f09c84c919a:   mov    %edx,%ebx
  0x00007f09c84c919c:   vmovq  %xmm3,%rdi
  0x00007f09c84c91a1:   mov    %bl,0x12(%rdi,%rax,1)
  0x00007f09c84c91a5:   shr    $0x8,%rdx
  0x00007f09c84c91a9:   mov    %edx,%edi
  0x00007f09c84c91ab:   vmovq  %xmm3,%rbx
  0x00007f09c84c91b0:   mov    %dil,0x13(%rbx,%rax,1)
  0x00007f09c84c91b5:   lea    0x1(%rcx),%edi
  0x00007f09c84c91b8:   lea    0x8(%rcx),%r14d
  0x00007f09c84c91bc:   shr    $0x8,%rdx
  0x00007f09c84c91c0:   movslq %r14d,%rbx
  0x00007f09c84c91c3:   mov    %edx,%ebp
  0x00007f09c84c91c5:   vmovq  %xmm3,%r13
  0x00007f09c84c91ca:   mov    %bpl,0x14(%r13,%rax,1)
  0x00007f09c84c91cf:   add    $0xfffffffffffffff9,%rbx
  0x00007f09c84c91d3:   shr    $0x8,%rdx
  0x00007f09c84c91d7:   mov    %rdx,%r13
  0x00007f09c84c91da:   shr    $0x8,%r13
  0x00007f09c84c91de:   mov    %edx,%ebp
  0x00007f09c84c91e0:   vmovq  %xmm3,%rdx
  0x00007f09c84c91e5:   mov    %bpl,0x15(%rdx,%rax,1)
  0x00007f09c84c91ea:   mov    %r13d,%ebp
  0x00007f09c84c91ed:   mov    %bpl,0x16(%rdx,%rax,1)
  0x00007f09c84c91f2:   shr    $0x8,%r13
  0x00007f09c84c91f6:   mov    %r13d,%ebp
  0x00007f09c84c91f9:   mov    %bpl,0x17(%rdx,%rax,1)
  0x00007f09c84c91fe:   cmp    0x10(%rsp),%rbx
  0x00007f09c84c9203:   jae    0x00007f09c84c9483
  0x00007f09c84c9209:   cmp    0xc(%rsp),%r14d
  0x00007f09c84c920e:   jae    0x00007f09c84c949d
  0x00007f09c84c9214:   mov    0x458(%r15),%rbx
  0x00007f09c84c921b:   movslq %ecx,%rdx
  0x00007f09c84c921e:   mov    %r8d,%edi
  0x00007f09c84c9221:   vmovq  %xmm3,%rcx
  0x00007f09c84c9226:   mov    %dil,0x11(%rcx,%rdx,1)
  0x00007f09c84c922b:   shr    $0x8,%r8
  0x00007f09c84c922f:   mov    %r8d,%ecx
  0x00007f09c84c9232:   vmovq  %xmm3,%rdi
  0x00007f09c84c9237:   mov    %cl,0x12(%rdi,%rdx,1)
  0x00007f09c84c923b:   inc    %r14d
  0x00007f09c84c923e:   shr    $0x8,%r8
  0x00007f09c84c9242:   mov    %r8,%rdi
  0x00007f09c84c9245:   shr    $0x8,%rdi
  0x00007f09c84c9249:   mov    %r8d,%r8d
  0x00007f09c84c924c:   vmovq  %xmm3,%rcx
  0x00007f09c84c9251:   mov    %r8b,0x13(%rcx,%rdx,1)
  0x00007f09c84c9256:   mov    %edi,%ecx
  0x00007f09c84c9258:   vmovq  %xmm3,%r8
  0x00007f09c84c925d:   mov    %cl,0x14(%r8,%rdx,1)
  0x00007f09c84c9262:   shr    $0x8,%rdi
  0x00007f09c84c9266:   mov    %rdi,%rax
  0x00007f09c84c9269:   shr    $0x8,%rax
  0x00007f09c84c926d:   mov    %edi,%r8d
  0x00007f09c84c9270:   vmovq  %xmm3,%rcx
  0x00007f09c84c9275:   mov    %r8b,0x15(%rcx,%rdx,1)
  0x00007f09c84c927a:   mov    %eax,%ecx
  0x00007f09c84c927c:   vmovq  %xmm3,%r8
  0x00007f09c84c9281:   mov    %cl,0x16(%r8,%rdx,1)
  0x00007f09c84c9286:   shr    $0x8,%rax
  0x00007f09c84c928a:   mov    %rax,%rcx
  0x00007f09c84c928d:   shr    $0x8,%rcx
  0x00007f09c84c9291:   mov    %eax,%r8d
  0x00007f09c84c9294:   vmovq  %xmm3,%rdi                   ;   {no_reloc}
  0x00007f09c84c9299:   mov    %r8b,0x17(%rdi,%rdx,1)
  0x00007f09c84c929e:   mov    %ecx,%ecx
  0x00007f09c84c92a0:   mov    %cl,0x18(%rdi,%rdx,1)        ; ImmutableOopMap {rdi=Oop xmm0=Oop xmm1=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c92a4:   test   %eax,(%rbx)                  ;   {poll}
  0x00007f09c84c92a6:   vmovd  %xmm2,%r9d
  0x00007f09c84c92ab:   add    $0xfffffffe,%r9d
  0x00007f09c84c92af:   vmovd  %r9d,%xmm2
  0x00007f09c84c92b4:   test   %r9d,%r9d
  0x00007f09c84c92b7:   jg     0x00007f09c84c8fd0
  0x00007f09c84c92bd:   vmovq  %xmm1,%rax
  0x00007f09c84c92c2:   vmovd  %xmm2,%r8d
  0x00007f09c84c92c7:   cmp    $0xffffffff,%r8d
  0x00007f09c84c92cb:   jle    0x00007f09c84c944d
  0x00007f09c84c92d1:   mov    %r8d,%r13d
  0x00007f09c84c92d4:   lea    (%r11,%r10,1),%r9d
  0x00007f09c84c92d8:   xor    %r11d,%esi
  0x00007f09c84c92db:   lea    0x7(%r14),%edi
  0x00007f09c84c92df:   rorx   $0x13,%esi,%ebx
  0x00007f09c84c92e5:   movslq %edi,%rdx
  0x00007f09c84c92e8:   mov    %esi,%r8d
  0x00007f09c84c92eb:   shl    $0x9,%r8d
  0x00007f09c84c92ef:   add    $0xfffffffffffffff9,%rdx
  0x00007f09c84c92f3:   rorx   $0x6,%r11d,%ebp
  0x00007f09c84c92f9:   xor    %esi,%ebp
  0x00007f09c84c92fb:   xor    %r8d,%ebp
  0x00007f09c84c92fe:   xor    %ebp,%ebx
  0x00007f09c84c9300:   imul   $0xadb4a92d,%r10d,%r11d
  0x00007f09c84c9307:   add    (%rsp),%r11d
  0x00007f09c84c930b:   lea    (%r11,%rbp,1),%ecx
  0x00007f09c84c930f:   rorx   $0x13,%ebx,%esi
  0x00007f09c84c9315:   mov    %esi,0x18(%rax)
  0x00007f09c84c9318:   mov    %ecx,%r8d
  0x00007f09c84c931b:   shr    $0x10,%r8d
  0x00007f09c84c931f:   xor    %ecx,%r8d
  0x00007f09c84c9322:   imul   $0xadb4a92d,%r11d,%r10d
  0x00007f09c84c9329:   add    (%rsp),%r10d
  0x00007f09c84c932d:   mov    %r10d,0x10(%rax)
  0x00007f09c84c9331:   imul   $0xd36d884b,%r8d,%r11d
  0x00007f09c84c9338:   mov    %ebx,%r8d
  0x00007f09c84c933b:   shl    $0x9,%r8d
  0x00007f09c84c933f:   mov    %r11d,%ecx
  0x00007f09c84c9342:   shr    $0x10,%ecx
  0x00007f09c84c9345:   xor    %r11d,%ecx
  0x00007f09c84c9348:   rorx   $0x6,%ebp,%r11d
  0x00007f09c84c934e:   xor    %ebx,%r11d
  0x00007f09c84c9351:   xor    %r8d,%r11d
  0x00007f09c84c9354:   mov    %r11d,0x14(%rax)
  0x00007f09c84c9358:   imul   $0xd36d884b,%ecx,%r8d
  0x00007f09c84c935f:   mov    %r9d,%ebx
  0x00007f09c84c9362:   shr    $0x10,%ebx
  0x00007f09c84c9365:   xor    %r9d,%ebx
  0x00007f09c84c9368:   mov    %r8d,%r9d
  0x00007f09c84c936b:   shr    $0x10,%r9d
  0x00007f09c84c936f:   xor    %r8d,%r9d
  0x00007f09c84c9372:   imul   $0xd36d884b,%ebx,%ebx
  0x00007f09c84c9378:   movslq %r9d,%r8
  0x00007f09c84c937b:   mov    %ebx,%r9d
  0x00007f09c84c937e:   shr    $0x10,%r9d
  0x00007f09c84c9382:   xor    %ebx,%r9d
  0x00007f09c84c9385:   imul   $0xd36d884b,%r9d,%ecx
  0x00007f09c84c938c:   mov    %ecx,%r9d
  0x00007f09c84c938f:   shr    $0x10,%r9d
  0x00007f09c84c9393:   xor    %ecx,%r9d
  0x00007f09c84c9396:   movslq %r9d,%r9
  0x00007f09c84c9399:   shl    $0x20,%r9
  0x00007f09c84c939d:   xor    %r9,%r8
  0x00007f09c84c93a0:   cmp    0x10(%rsp),%rdx              ;   {no_reloc}
  0x00007f09c84c93a5:   jae    0x00007f09c84c94a8
  0x00007f09c84c93ab:   cmp    0xc(%rsp),%edi
  0x00007f09c84c93af:   jae    0x00007f09c84c94a8
  0x00007f09c84c93b5:   mov    0x458(%r15),%rcx
  0x00007f09c84c93bc:   mov    %r8d,%r9d
  0x00007f09c84c93bf:   vmovq  %xmm3,%rbx
  0x00007f09c84c93c4:   mov    %r9b,0x10(%rbx,%r14,1)
  0x00007f09c84c93c9:   inc    %edi
  0x00007f09c84c93cb:   shr    $0x8,%r8
  0x00007f09c84c93cf:   movslq %r14d,%rdx
  0x00007f09c84c93d2:   mov    %r8d,%ebx
  0x00007f09c84c93d5:   vmovq  %xmm3,%r9
  0x00007f09c84c93da:   mov    %bl,0x11(%r9,%rdx,1)
  0x00007f09c84c93df:   shr    $0x8,%r8
  0x00007f09c84c93e3:   mov    %r8,%rbx
  0x00007f09c84c93e6:   shr    $0x8,%rbx
  0x00007f09c84c93ea:   mov    %r8d,%r8d
  0x00007f09c84c93ed:   mov    %r8b,0x12(%r9,%rdx,1)
  0x00007f09c84c93f2:   mov    %ebx,%r9d
  0x00007f09c84c93f5:   vmovq  %xmm3,%r8
  0x00007f09c84c93fa:   mov    %r9b,0x13(%r8,%rdx,1)
  0x00007f09c84c93ff:   shr    $0x8,%rbx
  0x00007f09c84c9403:   mov    %rbx,%r9
  0x00007f09c84c9406:   shr    $0x8,%r9
  0x00007f09c84c940a:   mov    %ebx,%r8d
  0x00007f09c84c940d:   vmovq  %xmm3,%rbx
  0x00007f09c84c9412:   mov    %r8b,0x14(%rbx,%rdx,1)
  0x00007f09c84c9417:   mov    %r9d,%r8d
  0x00007f09c84c941a:   mov    %r8b,0x15(%rbx,%rdx,1)
  0x00007f09c84c941f:   shr    $0x8,%r9
  0x00007f09c84c9423:   mov    %r9,%r8
  0x00007f09c84c9426:   shr    $0x8,%r8
  0x00007f09c84c942a:   mov    %r9d,%r9d
  0x00007f09c84c942d:   mov    %r9b,0x16(%rbx,%rdx,1)
  0x00007f09c84c9432:   mov    %r8d,%r8d
  0x00007f09c84c9435:   mov    %r8b,0x17(%rbx,%rdx,1)       ; ImmutableOopMap {rbx=Oop rax=Oop xmm0=Oop xmm3=Oop }
                                                            ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
  0x00007f09c84c943a:   test   %eax,(%rcx)                  ;   {poll}
  0x00007f09c84c943c:   dec    %r13d
  0x00007f09c84c943f:   cmp    $0xffffffff,%r13d
  0x00007f09c84c9443:   jle    0x00007f09c84c9450
  0x00007f09c84c9445:   mov    %edi,%r14d
  0x00007f09c84c9448:   jmp    0x00007f09c84c92d4
  0x00007f09c84c944d:   mov    %r14d,%edi
  0x00007f09c84c9450:   vmovq  %xmm3,%rdx
  0x00007f09c84c9455:   mov    0xc(%rsp),%r9d
  0x00007f09c84c945a:   cmp    %r9d,%edi
  0x00007f09c84c945d:   jl     0x00007f09c84c9514
  0x00007f09c84c9463:   add    $0x30,%rsp
  0x00007f09c84c9467:   pop    %rbp
  0x00007f09c84c9468:   cmp    0x450(%r15),%rsp             ;   {poll_return}
  0x00007f09c84c946f:   ja     0x00007f09c84c954c
  0x00007f09c84c9475:   ret
  0x00007f09c84c9476:   mov    %rdx,%r8
  0x00007f09c84c9479:   vmovd  %xmm2,%r9d
  0x00007f09c84c947e:   mov    %r14d,%edi
  0x00007f09c84c9481:   jmp    0x00007f09c84c9488
  0x00007f09c84c9483:   vmovq  %xmm1,%rax
  0x00007f09c84c9488:   mov    %r9d,%r13d
  0x00007f09c84c948b:   mov    %edi,%r14d
  0x00007f09c84c948e:   jmp    0x00007f09c84c94a8
  0x00007f09c84c9490:   mov    %rdx,%r8
  0x00007f09c84c9493:   vmovd  %xmm2,%r9d
  0x00007f09c84c9498:   mov    %r14d,%edi
  0x00007f09c84c949b:   jmp    0x00007f09c84c94a2
  0x00007f09c84c949d:   vmovq  %xmm1,%rax
  0x00007f09c84c94a2:   mov    %r9d,%r13d
  0x00007f09c84c94a5:   mov    %edi,%r14d
  0x00007f09c84c94a8:   mov    $0xffffff76,%esi
  0x00007f09c84c94ad:   mov    %rax,%rbp
  0x00007f09c84c94b0:   vmovsd %xmm3,(%rsp)
  0x00007f09c84c94b5:   mov    %r14d,0x8(%rsp)
  0x00007f09c84c94ba:   mov    %r13d,0x10(%rsp)
  0x00007f09c84c94bf:   mov    %r8,0x18(%rsp)
  0x00007f09c84c94c4:   data16 xchg %ax,%ax
  0x00007f09c84c94c7:   call   0x00007f09c7da9c00           ; ImmutableOopMap {rbp=Oop [0]=Oop }
                                                            ;*ifle {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@35 (line 486)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c94cc:   nopl   0x30008bc(%rax,%rax,1)       ;   {other}
  0x00007f09c84c94d4:   vmovd  %xmm1,%r9d
  0x00007f09c84c94d9:   jmp    0x00007f09c84c9488
  0x00007f09c84c94db:   vmovd  %xmm1,%r9d
  0x00007f09c84c94e0:   jmp    0x00007f09c84c94a2
  0x00007f09c84c94e2:   vmovd  %r8d,%xmm2
  0x00007f09c84c94e7:   jmp    0x00007f09c84c92c2
  0x00007f09c84c94ec:   mov    $0xffffff76,%esi
  0x00007f09c84c94f1:   mov    %rdx,(%rsp)
  0x00007f09c84c94f5:   mov    %r9d,0x8(%rsp)
  0x00007f09c84c94fa:   mov    %r8d,0xc(%rsp)
  0x00007f09c84c94ff:   vmovsd %xmm0,0x10(%rsp)
  0x00007f09c84c9505:   xchg   %ax,%ax
  0x00007f09c84c9507:   call   0x00007f09c7da9c00           ; ImmutableOopMap {[0]=Oop [16]=Oop }
                                                            ;*ifle {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@15 (line 484)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c950c:   nopl   0x40008fc(%rax,%rax,1)       ;   {other}
  0x00007f09c84c9514:   mov    $0xffffff45,%esi
  0x00007f09c84c9519:   mov    %rdx,%rbp
  0x00007f09c84c951c:   mov    %edi,0x8(%rsp)
  0x00007f09c84c9520:   mov    %r9d,0xc(%rsp)
  0x00007f09c84c9525:   vmovsd %xmm0,0x10(%rsp)
  0x00007f09c84c952b:   call   0x00007f09c7da9c00           ; ImmutableOopMap {rbp=Oop [16]=Oop }
                                                            ;*if_icmpge {reexecute=1 rethrow=0 return_oop=0}
                                                            ; - (reexecute) java.util.random.RandomGenerator::nextBytes@63 (line 489)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c9530:   nopl   0x5000920(%rax,%rax,1)       ;   {other}
  0x00007f09c84c9538:   mov    $0xfffffff6,%esi
  0x00007f09c84c953d:   xchg   %ax,%ax
  0x00007f09c84c953f:   call   0x00007f09c7da9c00           ; ImmutableOopMap {}
                                                            ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - java.util.random.RandomGenerator::nextBytes@3 (line 483)
                                                            ;   {runtime_call UncommonTrapBlob}
  0x00007f09c84c9544:   nopl   0x6000934(%rax,%rax,1)       ;   {other}
  0x00007f09c84c954c:   movabs $0x7f09c84c9468,%r10         ;   {internal_word}
  0x00007f09c84c9556:   mov    %r10,0x468(%r15)
  0x00007f09c84c955d:   jmp    0x00007f09c7daad00           ;   {runtime_call SafepointBlob}
  0x00007f09c84c9562:   call   Stub::nmethod_entry_barrier  ;   {runtime_call StubRoutines (final stubs)}
  0x00007f09c84c9567:   jmp    0x00007f09c84c8dda
  0x00007f09c84c956c:   hlt
  0x00007f09c84c956d:   hlt
  0x00007f09c84c956e:   hlt
  0x00007f09c84c956f:   hlt
[Exception Handler]
  0x00007f09c84c9570:   jmp    0x00007f09c7e6b100           ;   {no_reloc}
[Deopt Handler Code]
  0x00007f09c84c9575:   call   0x00007f09c84c957a
  0x00007f09c84c957a:   subq   $0x5,(%rsp)
  0x00007f09c84c957f:   jmp    0x00007f09c7da9fa0           ;   {runtime_call DeoptimizationBlob}
  0x00007f09c84c9584:   hlt
  0x00007f09c84c9585:   hlt
  0x00007f09c84c9586:   hlt
  0x00007f09c84c9587:   hlt
--------------------------------------------------------------------------------
[/Disassembly]

(I'm not familiar with assembly) I guess loop unrolling is working?

Glavo avatar Jun 25 '23 07:06 Glavo

I'm not sure too, but there's vmovd and vmovq which are moving double or quad words at once, so it appears vectorized. But using Unsafe/ByteArrayLittleEndian explicitly still seems better optimized from your results; I guess it might be because that you know the input random number size (int/long sizes). Can you try how putLongUnaligned etc. work, as VH implementation delegates to the unaligned versions (for plain get/set)?

liach avatar Jun 25 '23 07:06 liach

Use Unsafe::putIntUnaligned/Unsafe::putLongUnaligned:

Results
Benchmark                        (length)   Mode  Cnt        Score       Error   Units
RandomBenchmark.L32X64MixRandom         0  thrpt    5  1524860.609 ± 25736.732  ops/ms
RandomBenchmark.L32X64MixRandom         1  thrpt    5   215406.292 ±  2337.313  ops/ms
RandomBenchmark.L32X64MixRandom         2  thrpt    5   201345.579 ±   940.754  ops/ms
RandomBenchmark.L32X64MixRandom         3  thrpt    5   191355.115 ±  1805.205  ops/ms
RandomBenchmark.L32X64MixRandom         4  thrpt    5   184609.621 ±  1255.979  ops/ms
RandomBenchmark.L32X64MixRandom         5  thrpt    5   164468.663 ±  1677.559  ops/ms
RandomBenchmark.L32X64MixRandom         6  thrpt    5   156960.655 ±   663.464  ops/ms
RandomBenchmark.L32X64MixRandom         7  thrpt    5   153595.183 ±  3983.234  ops/ms
RandomBenchmark.L32X64MixRandom         8  thrpt    5   186632.617 ±   425.385  ops/ms
RandomBenchmark.L32X64MixRandom        10  thrpt    5   104736.408 ±   345.176  ops/ms
RandomBenchmark.L32X64MixRandom        12  thrpt    5   105447.874 ±   399.328  ops/ms
RandomBenchmark.L32X64MixRandom        14  thrpt    5    95664.265 ±    80.052  ops/ms
RandomBenchmark.L32X64MixRandom        16  thrpt    5   109343.697 ±    32.207  ops/ms
RandomBenchmark.L32X64MixRandom        32  thrpt    5    62252.931 ±   469.271  ops/ms
RandomBenchmark.L32X64MixRandom        64  thrpt    5    31358.265 ±    89.965  ops/ms
RandomBenchmark.L32X64MixRandom       128  thrpt    5    16607.450 ±    70.292  ops/ms
RandomBenchmark.L32X64MixRandom       256  thrpt    5     8327.905 ±     9.349  ops/ms
RandomBenchmark.L32X64MixRandom       512  thrpt    5     4379.807 ±     9.959  ops/ms
RandomBenchmark.L32X64MixRandom      1024  thrpt    5     2169.190 ±     0.127  ops/ms
RandomBenchmark.L32X64MixRandom      2048  thrpt    5     1081.397 ±    64.131  ops/ms
RandomBenchmark.L32X64MixRandom      4096  thrpt    5      546.185 ±     0.895  ops/ms
RandomBenchmark.L32X64MixRandom      8192  thrpt    5      273.206 ±     0.236  ops/ms
RandomBenchmark.Random                  0  thrpt    5  1523782.776 ± 11592.739  ops/ms
RandomBenchmark.Random                  1  thrpt    5   364587.781 ± 23904.474  ops/ms
RandomBenchmark.Random                  2  thrpt    5   324850.835 ±  1698.265  ops/ms
RandomBenchmark.Random                  3  thrpt    5   290855.010 ±  3524.691  ops/ms
RandomBenchmark.Random                  4  thrpt    5   286867.826 ±    58.331  ops/ms
RandomBenchmark.Random                  5  thrpt    5   151454.671 ±   525.393  ops/ms
RandomBenchmark.Random                  6  thrpt    5   147070.562 ±  1477.003  ops/ms
RandomBenchmark.Random                  7  thrpt    5   138053.754 ±   151.065  ops/ms
RandomBenchmark.Random                  8  thrpt    5   154585.711 ±  1495.177  ops/ms
RandomBenchmark.Random                 10  thrpt    5    92987.135 ±  1284.808  ops/ms
RandomBenchmark.Random                 12  thrpt    5   102440.798 ±   204.633  ops/ms
RandomBenchmark.Random                 14  thrpt    5    76235.547 ±    64.113  ops/ms
RandomBenchmark.Random                 16  thrpt    5    77672.178 ±    28.365  ops/ms
RandomBenchmark.Random                 32  thrpt    5    39193.225 ±    40.209  ops/ms
RandomBenchmark.Random                 64  thrpt    5    19684.798 ±     7.152  ops/ms
RandomBenchmark.Random                128  thrpt    5     9884.926 ±     1.765  ops/ms
RandomBenchmark.Random                256  thrpt    5     4862.050 ±     1.655  ops/ms
RandomBenchmark.Random                512  thrpt    5     2457.171 ±     1.042  ops/ms
RandomBenchmark.Random               1024  thrpt    5     1228.285 ±     0.736  ops/ms
RandomBenchmark.Random               2048  thrpt    5      615.795 ±     0.977  ops/ms
RandomBenchmark.Random               4096  thrpt    5      311.657 ±     0.124  ops/ms
RandomBenchmark.Random               8192  thrpt    5      152.179 ±     0.031  ops/ms

Use ByteArrayLittleEndian (#14636):

Results ``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1528297.256 ± 11983.204 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215656.684 ± 1794.981 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 201420.705 ± 1377.903 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190722.759 ± 3562.388 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184578.897 ± 587.992 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164248.972 ± 1153.358 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 145869.045 ± 1342.215 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153291.149 ± 4666.694 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 163664.923 ± 559.088 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101878.885 ± 322.857 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98918.245 ± 305.201 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95554.296 ± 253.037 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114686.083 ± 10.662 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54694.191 ± 77.666 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29272.233 ± 13.130 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15423.642 ± 13.856 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8007.269 ± 6.237 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.672 ± 1.192 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2389.270 ± 1.732 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.966 ± 0.645 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.226 ± 0.026 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 305.380 ± 0.147 ops/ms RandomBenchmark.Random 0 thrpt 5 1519068.332 ± 17554.468 ops/ms RandomBenchmark.Random 1 thrpt 5 349320.420 ± 50935.172 ops/ms RandomBenchmark.Random 2 thrpt 5 325239.890 ± 1852.854 ops/ms RandomBenchmark.Random 3 thrpt 5 293215.822 ± 5502.425 ops/ms RandomBenchmark.Random 4 thrpt 5 270030.002 ± 635.288 ops/ms RandomBenchmark.Random 5 thrpt 5 135824.338 ± 1411.090 ops/ms RandomBenchmark.Random 6 thrpt 5 131045.378 ± 131.826 ops/ms RandomBenchmark.Random 7 thrpt 5 123870.748 ± 281.168 ops/ms RandomBenchmark.Random 8 thrpt 5 159068.553 ± 577.367 ops/ms RandomBenchmark.Random 10 thrpt 5 97813.949 ± 133.771 ops/ms RandomBenchmark.Random 12 thrpt 5 104909.089 ± 54.468 ops/ms RandomBenchmark.Random 14 thrpt 5 75004.214 ± 237.386 ops/ms RandomBenchmark.Random 16 thrpt 5 78205.257 ± 91.166 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.218 ± 24.475 ops/ms RandomBenchmark.Random 64 thrpt 5 19676.129 ± 8.671 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.330 ± 1.669 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.997 ± 1.652 ops/ms RandomBenchmark.Random 512 thrpt 5 2429.244 ± 2.227 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.338 ± 0.306 ops/ms RandomBenchmark.Random 2048 thrpt 5 619.758 ± 0.055 ops/ms RandomBenchmark.Random 4096 thrpt 5 274.033 ± 0.714 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.607 ± 0.013 ops/ms ```

The result seems interesting.

Glavo avatar Jun 25 '23 13:06 Glavo

The new implementation of ByteArrayLittleEndian in #14636 performs consistently with the old implementation using VarHandle. (This conclusion gives me more confidence in #14636)

Interestingly, Unsafe::putIntUnaligned/Unsafe::putLongUnaligned is not always faster than the new implementation of ByteArrayLittleEndian, even though it does not have additional bounds checking.

Glavo avatar Jun 25 '23 13:06 Glavo

Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.

liach avatar Jun 25 '23 13:06 liach

Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.

Use ByteArrayLittleEndian: https://github.com/Glavo/jdk/tree/random-byte-array

Use putXxxUnaligned: https://github.com/Glavo/jdk/tree/random-unaligned

This is my test server:

            .-/+oossssoo+/-.               glavo@minecraft-server
        `:+ssssssssssssssssss+:`           ----------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 20.04.6 LTS x86_64
    .ossssssssssssssssssdMMMNysssso.       Kernel: 5.15.0-71-generic
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Uptime: 10 days, 2 hours, 42 mins
  +ssssssssshmydMMMMMMMNddddyssssssss+     Packages: 2165 (dpkg), 13 (snap)
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Shell: bash 5.0.17
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Terminal: /dev/pts/2
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   CPU: AMD Ryzen 7 5800X (16) @ 4.600GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   GPU: NVIDIA GeForce GT 710
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Memory: 12570MiB / 32011MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

I need to spend the day updating my server and upgrading some accessories tomorrow. If you are unable to replicate my previous JMH results, I will rerun all tests after upgrading the server.

Glavo avatar Jun 25 '23 14:06 Glavo

@SirYwell

  1. I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained

I updated test/jdk/java/util/Random/NextBytes.java to also test RandomGenerator::nextBytes.

  1. I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.

I just did a quick search for nextBytes inside the JDK. In fact, there are many use cases for Random::nextBytes.

For example, in ZipEntryFreeTest, it is used to fill ten arrays with the length of 2,000,000:

https://github.com/openjdk/jdk/blob/0db63ec76d451295e273c8e3272d013e2c3348ef/test/jdk/java/util/zip/ZipFile/ZipEntryFreeTest.java#L77-L87

It is widely used in unit testing to generate random test data. Optimizing it can help developers reduce the time spent running tests.

Glavo avatar Jun 25 '23 23:06 Glavo

Running with benchmark in the patch: Using unsafe put:

Benchmark                                        (algo)  (length)   Mode  Cnt       Score        Error   Units
RandomGeneratorNextBytes.testNextBytes           Random         1  thrpt   12  139652.882 ±   1352.622  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         2  thrpt   12  140331.882 ±   1282.855  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         3  thrpt   12  139557.391 ±   1175.079  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         4  thrpt   12  138449.059 ±   1322.123  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         6  thrpt   12   71200.906 ±    951.863  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         7  thrpt   12   72158.561 ±    334.151  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         9  thrpt   12   48154.027 ±    209.683  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random        10  thrpt   12   44386.601 ±   7802.001  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random        48  thrpt   12   11711.709 ±     51.540  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random       512  thrpt   12     984.135 ±    114.699  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random      1000  thrpt   12     499.867 ±     75.867  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         1  thrpt   12  291444.618 ±    908.968  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         2  thrpt   12  279086.952 ±    785.740  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         3  thrpt   12  272168.427 ±   1179.134  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         4  thrpt   12  225741.559 ± 108229.595  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         6  thrpt   12   93584.669 ±   4203.102  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         7  thrpt   12   94964.676 ±  14241.917  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         9  thrpt   12  145814.460 ±    464.698  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom        10  thrpt   12  142188.443 ±    753.706  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom        48  thrpt   12   55356.994 ±    142.517  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom       512  thrpt   12    5963.217 ±     33.529  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom      1000  thrpt   12    2964.744 ±     23.344  ops/ms

Using bytearray:

Benchmark                                        (algo)  (length)   Mode  Cnt       Score       Error   Units
RandomGeneratorNextBytes.testNextBytes           Random         1  thrpt   12  139322.101 ±  1176.140  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         2  thrpt   12   99385.014 ± 38348.493  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         3  thrpt   12   81765.495 ±  2291.560  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         4  thrpt   12   89411.431 ± 31054.806  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         6  thrpt   12   42040.396 ±  3441.116  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         7  thrpt   12   38358.942 ±  2379.015  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random         9  thrpt   12   31104.518 ±  1299.442  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random        10  thrpt   12   28871.366 ±  1634.108  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random        48  thrpt   12    8907.501 ±   506.556  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random       512  thrpt   12     871.237 ±    80.083  ops/ms
RandomGeneratorNextBytes.testNextBytes           Random      1000  thrpt   12     432.025 ±    28.046  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         1  thrpt   12  120484.263 ±  4843.672  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         2  thrpt   12  107474.776 ±  4915.841  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         3  thrpt   12  104832.882 ±  9039.199  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         4  thrpt   12  106411.957 ±  4440.441  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         6  thrpt   12   97407.747 ± 13756.916  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         7  thrpt   12   87383.519 ±  3631.554  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom         9  thrpt   12   53060.202 ±  2519.723  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom        10  thrpt   12   48562.023 ±  3260.066  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom        48  thrpt   12   23700.122 ±  1973.472  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom       512  thrpt   12    2635.817 ±   129.181  ops/ms
RandomGeneratorNextBytes.testNextBytes  L32X64MixRandom      1000  thrpt   12    1415.250 ±    55.821  ops/ms

The benchmark results are somewhat weird: L32X64MixRandom has drastically different results even for small sizes that don't involve multi-byte writes.

liach avatar Jun 26 '23 02:06 liach

The benchmark results are somewhat weird: L32X64MixRandom has drastically different results even for small sizes that don't involve multi-byte writes.

These are unbelievable results.

In your results, even for small byte arrays (length < 8), ByteArrayLittleEndian is much slower than Unsafe. This is very strange because Unsafe or ByteArrayLittleEndian are not actually called for small byte arrays.

Glavo avatar Jun 26 '23 05:06 Glavo

On a side note, I think you can become an author in the JDK project: https://openjdk.org/guide/#becoming-an-author

liach avatar Jun 28 '23 02:06 liach

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jul 26 '23 05:07 bridgekeeper[bot]

@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar Aug 23 '23 05:08 bridgekeeper[bot]

/open

Glavo avatar Aug 23 '23 05:08 Glavo

@Glavo This pull request is now open

openjdk[bot] avatar Aug 23 '23 06:08 openjdk[bot]

@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Sep 20 '23 07:09 bridgekeeper[bot]

@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar Oct 18 '23 11:10 bridgekeeper[bot]

/open

Glavo avatar Dec 23 '23 19:12 Glavo