jdk
jdk copied to clipboard
Optimize RandomGenerator::nextBytes
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [ ] Commit message must refer to an issue
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14638/head:pull/14638
$ git checkout pull/14638
Update a local copy of the PR:
$ git checkout pull/14638
$ git pull https://git.openjdk.org/jdk.git pull/14638/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14638
View PR using the GUI difftool:
$ git pr show -t 14638
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14638.diff
:wave: Welcome back Glavo! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@Glavo The following label will be automatically applied to this pull request:
-
core-libs
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Here are the results for the RandomGeneratorNextBytes
benchmark (Here will be continuously updated to show the latest results):
(Baseline) (This PR)
Benchmark (algo) (length) Mode Cnt Score Error Units Score Error Units
RandomGeneratorNextBytes.testNextBytes Random 1 thrpt 5 292124.677 ± 6377.255 ops/ms 346221.250 ± 86860.488 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 2 thrpt 5 261235.014 ± 15707.040 ops/ms 323470.739 ± 16084.063 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 4 thrpt 5 240194.023 ± 4417.534 ops/ms 286154.793 ± 2162.091 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 8 thrpt 5 120707.831 ± 5701.440 ops/ms 156008.005 ± 128.043 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 16 thrpt 5 63594.497 ± 438.139 ops/ms 78236.080 ± 15.013 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 32 thrpt 5 35420.287 ± 427.508 ops/ms 39262.435 ± 18.943 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 64 thrpt 5 17651.831 ± 25.639 ops/ms 19688.311 ± 19.507 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 128 thrpt 5 8554.908 ± 19.695 ops/ms 9887.630 ± 6.683 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 256 thrpt 5 4560.283 ± 27.455 ops/ms 4874.348 ± 3.856 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 1024 thrpt 5 1161.771 ± 2.053 ops/ms 1242.620 ± 0.311 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 4096 thrpt 5 294.610 ± 0.764 ops/ms 309.557 ± 0.131 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 16384 thrpt 5 73.885 ± 0.055 ops/ms 77.973 ± 0.038 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1 thrpt 5 214239.266 ± 1103.018 ops/ms 215641.075 ± 1901.826 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 2 thrpt 5 199700.840 ± 465.203 ops/ms 201313.181 ± 1069.213 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 4 thrpt 5 184605.447 ± 1057.641 ops/ms 184081.550 ± 1068.982 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 8 thrpt 5 144195.042 ± 2155.839 ops/ms 166970.270 ± 62.509 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 16 thrpt 5 92010.333 ± 272.006 ops/ms 90731.669 ± 1179.712 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 32 thrpt 5 45378.019 ± 487.964 ops/ms 54470.769 ± 789.986 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 64 thrpt 5 24958.803 ± 57.066 ops/ms 29271.323 ± 62.528 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 128 thrpt 5 12967.609 ± 30.151 ops/ms 15460.181 ± 50.493 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 256 thrpt 5 6620.502 ± 8.294 ops/ms 7974.591 ± 20.440 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1024 thrpt 5 1670.174 ± 14.304 ops/ms 2391.758 ± 1.891 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 4096 thrpt 5 415.035 ± 0.771 ops/ms 609.107 ± 0.279 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 16384 thrpt 5 103.704 ± 0.013 ops/ms 152.771 ± 0.270 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 1 thrpt 5 378919.462 ± 20733.749 ops/ms 382509.180 ± 418.348 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 2 thrpt 5 352209.019 ± 340.381 ops/ms 346027.427 ± 2979.327 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 4 thrpt 5 327951.428 ± 172.418 ops/ms 327855.763 ± 280.082 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 8 thrpt 5 269875.472 ± 48.783 ops/ms 229580.541 ± 24.469 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 16 thrpt 5 157786.908 ± 363.565 ops/ms 183664.801 ± 19.788 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 32 thrpt 5 85927.731 ± 1988.607 ops/ms 135010.073 ± 12.742 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 64 thrpt 5 45121.367 ± 113.888 ops/ms 90891.031 ± 51.981 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 128 thrpt 5 23266.361 ± 83.143 ops/ms 52998.113 ± 527.246 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 256 thrpt 5 10845.534 ± 23.174 ops/ms 29423.939 ± 10.840 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 1024 thrpt 5 2724.955 ± 1.782 ops/ms 7910.042 ± 175.002 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 4096 thrpt 5 744.280 ± 0.064 ops/ms 2064.625 ± 0.646 ops/ms
RandomGeneratorNextBytes.testNextBytes Xoshiro256PlusPlus 16384 thrpt 5 186.613 ± 0.012 ops/ms 573.580 ± 5.850 ops/ms
This PR significantly improves performance for the default implementation in RandomGenerator
.
For the Xoshiro256**
algorithm, when the target array is large, the performance of this PR is 3.07 times that of the original.
The only confirmed performance degradation (<5%) is when the byte array is empty.
For byte arrays with a length greater than 4 (or 8 for RandomGenerator
), we often see a performance improvement of 10% to 30%.
You should probably update the 2 existing benchmarks in https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomNext.java and https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java, or include your benchmark there.
In intellij idea, you can add the micro directory as a module and add jmh maven library and the jdk modules as compile-time dependencies, so intellij can help working on the benchmarks.
Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?
Also, should we use ByteArrayLittleEndian instead of Unsafe, once ByteArrayLittleEndian is no longer dependent on VarHandle?
Due to the overhead of using non aligned reads and checking indexes, ByteArrayLittleEndian
is slower than directly calling getLong
.
I am running benchmarks based on ByteArrayLittleEndian
. The currently benchmark result is that Unsafe.getLong
is 13.76% faster than ByteArrayLittleEndian
for L32X64MixRandom (bytes.length = 8
).
Due to the overhead of using non aligned reads and checking indexes,
ByteArrayLittleEndian
is slower than directly callinggetLong
.
Well, it seems that this is not always correct.
For L32X64MixRandom, when the bytes length is greater than 1024, using ByteArrayLittleEndian
is actually 10% faster than getLong
. I don't understand why this is happening.
I think putLongUnaligned
tries to put aligned if it can, don't know how C1 or C2 handles it. I think it's a win as long as either Unsafe or VarHandle is faster than the existing manual loop (which could have already been vectorized by C2)
Benchmarking results based on current ByteArrayLittleEndian
(VarHandle):
Results
``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1519005.337 ± 10166.724 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215438.181 ± 1296.270 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 203155.966 ± 1102.743 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190993.049 ± 1583.488 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184699.083 ± 1656.026 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164362.211 ± 1688.353 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 156946.704 ± 1188.623 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153627.148 ± 2754.413 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 164011.508 ± 87.110 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101824.800 ± 183.479 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98005.608 ± 188.852 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95530.799 ± 109.554 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114617.995 ± 51.252 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54787.870 ± 36.547 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29267.303 ± 17.143 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15590.939 ± 5.373 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8001.160 ± 2.425 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.970 ± 1.097 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2390.227 ± 0.317 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.989 ± 0.190 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.188 ± 0.051 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 302.962 ± 1.783 ops/ms RandomBenchmark.Random 0 thrpt 5 1511686.595 ± 64669.779 ops/ms RandomBenchmark.Random 1 thrpt 5 355958.380 ± 33275.649 ops/ms RandomBenchmark.Random 2 thrpt 5 322566.151 ± 2151.769 ops/ms RandomBenchmark.Random 3 thrpt 5 291901.421 ± 3873.578 ops/ms RandomBenchmark.Random 4 thrpt 5 270129.002 ± 19.117 ops/ms RandomBenchmark.Random 5 thrpt 5 135856.891 ± 566.745 ops/ms RandomBenchmark.Random 6 thrpt 5 130272.051 ± 61.738 ops/ms RandomBenchmark.Random 7 thrpt 5 123843.896 ± 107.200 ops/ms RandomBenchmark.Random 8 thrpt 5 159297.447 ± 77.475 ops/ms RandomBenchmark.Random 10 thrpt 5 97626.041 ± 420.827 ops/ms RandomBenchmark.Random 12 thrpt 5 104838.370 ± 52.721 ops/ms RandomBenchmark.Random 14 thrpt 5 75077.145 ± 142.321 ops/ms RandomBenchmark.Random 16 thrpt 5 78217.212 ± 17.730 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.349 ± 5.522 ops/ms RandomBenchmark.Random 64 thrpt 5 19673.761 ± 18.463 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.985 ± 1.844 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.253 ± 0.684 ops/ms RandomBenchmark.Random 512 thrpt 5 2431.380 ± 1.006 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.599 ± 0.204 ops/ms RandomBenchmark.Random 2048 thrpt 5 618.926 ± 0.181 ops/ms RandomBenchmark.Random 4096 thrpt 5 272.700 ± 1.009 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.693 ± 0.117 ops/ms ```
I looked into that a few months ago too but didn't come around to actually create a PR mainly for the following reasons (besides lack of time):
- I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained
- I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.
I think 1. should definitely be addressed. There is java/util/Random/NextBytes.java
with a very basic test, but it only covers Random
and I think a proper test should put the implementation note directly in code.
I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.
Personally, I often use it to generate some data for unit testing, so improving its performance would be helpful to me.
I think 1. should definitely be addressed. There is
java/util/Random/NextBytes.java
with a very basic test, but it only coversRandom
and I think a proper test should put the implementation note directly in code.
I agree.
Looking into the baseline results:
RandomBenchmark.L32X64MixRandom 14 thrpt 5 88666.991 ± 247.778 ops/ms
RandomBenchmark.L32X64MixRandom 16 thrpt 5 94277.271 ± 661.097 ops/ms <-- significantly higher than 14
RandomBenchmark.Random 6 thrpt 5 121245.951 ± 1767.579 ops/ms
RandomBenchmark.Random 7 thrpt 5 124512.260 ± 2239.107 ops/ms <-- higher than 6
RandomBenchmark.Random 8 thrpt 5 103982.515 ± 2052.329 ops/ms
Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.
Auto-vectorization might be already at work for RandomGenerator. We need to prove the optimization offered by Unsafe putLong and VarHandle are reliable instead of some unreliable side effects of JIT, and that's why I am hesitant to create a JBS issue.
Disassembly (baseline)
============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------
Compiled method (c2) 155 693 4 java.util.random.RandomGenerator::nextBytes (100 bytes)
total in heap [0x00007f09c84c8c10,0x00007f09c84c9708] = 2808
relocation [0x00007f09c84c8d68,0x00007f09c84c8da0] = 56
main code [0x00007f09c84c8da0,0x00007f09c84c9570] = 2000
stub code [0x00007f09c84c9570,0x00007f09c84c9588] = 24
oops [0x00007f09c84c9588,0x00007f09c84c9590] = 8
metadata [0x00007f09c84c9590,0x00007f09c84c95a0] = 16
scopes data [0x00007f09c84c95a0,0x00007f09c84c9660] = 192
scopes pcs [0x00007f09c84c9660,0x00007f09c84c96f0] = 144
dependencies [0x00007f09c84c96f0,0x00007f09c84c96f8] = 8
nul chk table [0x00007f09c84c96f8,0x00007f09c84c9708] = 16
[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]
--------------------------------------------------------------------------------
[Entry Point]
# {method} {0x000000080017fdf8} 'nextBytes' '([B)V' in 'java/util/random/RandomGenerator'
# this: rsi:rsi = 'java/util/random/RandomGenerator'
# parm0: rdx:rdx = '[B'
# [sp+0x40] (sp of caller)
0x00007f09c84c8da0: mov 0x8(%rsi),%r10d
0x00007f09c84c8da4: movabs $0x800000000,%r11
0x00007f09c84c8dae: add %r11,%r10
0x00007f09c84c8db1: cmp %r10,%rax
0x00007f09c84c8db4: jne 0x00007f09c7da3d80 ; {runtime_call ic_miss_stub}
0x00007f09c84c8dba: xchg %ax,%ax
0x00007f09c84c8dbc: nopl 0x0(%rax)
[Verified Entry Point]
0x00007f09c84c8dc0: mov %eax,-0x14000(%rsp)
0x00007f09c84c8dc7: push %rbp
0x00007f09c84c8dc8: sub $0x30,%rsp
0x00007f09c84c8dcc: cmpl $0x1,0x20(%r15)
0x00007f09c84c8dd4: jne 0x00007f09c84c9562
0x00007f09c84c8dda: vmovq %rsi,%xmm0
0x00007f09c84c8ddf: mov 0xc(%rdx),%r9d ; implicit exception: dispatches to 0x00007f09c84c9538
0x00007f09c84c8de3: mov %r9d,%ebp
0x00007f09c84c8de6: sar $0x3,%ebp
0x00007f09c84c8de9: xor %edi,%edi
0x00007f09c84c8deb: test %ebp,%ebp
0x00007f09c84c8ded: jle 0x00007f09c84c945a
0x00007f09c84c8df3: mov 0x8(%rsi),%r10d
0x00007f09c84c8df7: lea -0x1(%rbp),%r8d
0x00007f09c84c8dfb: cmp $0x1021990,%r10d ; {metadata('jdk/random/L32X64MixRandom')}
0x00007f09c84c8e02: jne 0x00007f09c84c94ec
0x00007f09c84c8e08: mov %rsi,%rax
0x00007f09c84c8e0b: mov 0x10(%rax),%r10d
0x00007f09c84c8e0f: mov 0x18(%rax),%esi
0x00007f09c84c8e12: mov 0x14(%rax),%r11d
0x00007f09c84c8e16: mov 0xc(%rax),%ecx
0x00007f09c84c8e19: add $0xfffffffe,%ebp
0x00007f09c84c8e1c: movslq %r9d,%r13
0x00007f09c84c8e1f: vmovq %rdx,%xmm3
0x00007f09c84c8e24: mov %r9d,0xc(%rsp)
0x00007f09c84c8e29: vmovd %r8d,%xmm1
0x00007f09c84c8e2e: mov %ecx,(%rsp)
0x00007f09c84c8e31: mov %r13,0x10(%rsp)
0x00007f09c84c8e36: xor %r11d,%esi
0x00007f09c84c8e39: lea (%r11,%r10,1),%ecx
0x00007f09c84c8e3d: lea 0x7(%rdi),%r14d
0x00007f09c84c8e41: mov %ecx,%r9d
0x00007f09c84c8e44: shr $0x10,%r9d
0x00007f09c84c8e48: xor %ecx,%r9d
0x00007f09c84c8e4b: movslq %r14d,%rdx
0x00007f09c84c8e4e: imul $0xd36d884b,%r9d,%r9d
0x00007f09c84c8e55: add $0xfffffffffffffff9,%rdx
0x00007f09c84c8e59: mov %r9d,%ecx
0x00007f09c84c8e5c: shr $0x10,%ecx
0x00007f09c84c8e5f: xor %r9d,%ecx
0x00007f09c84c8e62: imul $0xadb4a92d,%r10d,%r9d
0x00007f09c84c8e69: add (%rsp),%r9d
0x00007f09c84c8e6d: imul $0xd36d884b,%ecx,%ebx
0x00007f09c84c8e73: imul $0xadb4a92d,%r9d,%r10d
0x00007f09c84c8e7a: add (%rsp),%r10d
0x00007f09c84c8e7e: mov %r10d,0x10(%rax)
0x00007f09c84c8e82: mov %ebx,%r8d
0x00007f09c84c8e85: shr $0x10,%r8d
0x00007f09c84c8e89: xor %ebx,%r8d
0x00007f09c84c8e8c: mov %esi,%ebx
0x00007f09c84c8e8e: shl $0x9,%ebx
0x00007f09c84c8e91: movslq %r8d,%rcx
0x00007f09c84c8e94: rorx $0x6,%r11d,%r8d
0x00007f09c84c8e9a: xor %esi,%r8d
0x00007f09c84c8e9d: xor %ebx,%r8d
0x00007f09c84c8ea0: add %r8d,%r9d
0x00007f09c84c8ea3: shl $0x20,%rcx
0x00007f09c84c8ea7: mov %r9d,%ebx
0x00007f09c84c8eaa: shr $0x10,%ebx
0x00007f09c84c8ead: xor %r9d,%ebx
0x00007f09c84c8eb0: rorx $0x6,%r8d,%r11d
0x00007f09c84c8eb6: imul $0xd36d884b,%ebx,%r9d
0x00007f09c84c8ebd: rorx $0x13,%esi,%ebx
0x00007f09c84c8ec3: xor %r8d,%ebx
0x00007f09c84c8ec6: xor %ebx,%r11d
0x00007f09c84c8ec9: mov %r9d,%r8d
0x00007f09c84c8ecc: shr $0x10,%r8d
0x00007f09c84c8ed0: xor %r9d,%r8d
0x00007f09c84c8ed3: rorx $0x13,%ebx,%esi
0x00007f09c84c8ed9: mov %esi,0x18(%rax)
0x00007f09c84c8edc: imul $0xd36d884b,%r8d,%r9d
0x00007f09c84c8ee3: shl $0x9,%ebx
0x00007f09c84c8ee6: xor %ebx,%r11d
0x00007f09c84c8ee9: mov %r11d,0x14(%rax)
0x00007f09c84c8eed: mov %r9d,%ebx
0x00007f09c84c8ef0: shr $0x10,%ebx
0x00007f09c84c8ef3: xor %r9d,%ebx
0x00007f09c84c8ef6: movslq %ebx,%r8
0x00007f09c84c8ef9: xor %rcx,%r8 ; {no_reloc}
0x00007f09c84c8efc: cmp 0x10(%rsp),%rdx
0x00007f09c84c8f01: jae 0x00007f09c84c94d4
0x00007f09c84c8f07: cmp 0xc(%rsp),%r14d
0x00007f09c84c8f0c: jae 0x00007f09c84c94db
0x00007f09c84c8f12: mov 0x458(%r15),%rcx
0x00007f09c84c8f19: movslq %edi,%rbx
0x00007f09c84c8f1c: mov %r8d,%r9d
0x00007f09c84c8f1f: vmovq %xmm3,%rdx
0x00007f09c84c8f24: mov %r9b,0x10(%rdx,%rdi,1)
0x00007f09c84c8f29: shr $0x8,%r8
0x00007f09c84c8f2d: mov %r8d,%r9d
0x00007f09c84c8f30: mov %r9b,0x11(%rdx,%rbx,1)
0x00007f09c84c8f35: inc %r14d
0x00007f09c84c8f38: shr $0x8,%r8
0x00007f09c84c8f3c: mov %r8,%rdi
0x00007f09c84c8f3f: shr $0x8,%rdi
0x00007f09c84c8f43: mov %r8d,%r9d
0x00007f09c84c8f46: mov %r9b,0x12(%rdx,%rbx,1)
0x00007f09c84c8f4b: mov %edi,%r9d
0x00007f09c84c8f4e: mov %r9b,0x13(%rdx,%rbx,1)
0x00007f09c84c8f53: shr $0x8,%rdi
0x00007f09c84c8f57: mov %rdi,%rdx
0x00007f09c84c8f5a: shr $0x8,%rdx
0x00007f09c84c8f5e: mov %edi,%r8d
0x00007f09c84c8f61: vmovq %xmm3,%r9
0x00007f09c84c8f66: mov %r8b,0x14(%r9,%rbx,1)
0x00007f09c84c8f6b: mov %edx,%r9d
0x00007f09c84c8f6e: vmovq %xmm3,%r8
0x00007f09c84c8f73: mov %r9b,0x15(%r8,%rbx,1)
0x00007f09c84c8f78: shr $0x8,%rdx
0x00007f09c84c8f7c: mov %rdx,%r9
0x00007f09c84c8f7f: shr $0x8,%r9
0x00007f09c84c8f83: mov %edx,%r8d
0x00007f09c84c8f86: vmovq %xmm3,%rdi
0x00007f09c84c8f8b: mov %r8b,0x16(%rdi,%rbx,1)
0x00007f09c84c8f90: mov %r9d,%r9d
0x00007f09c84c8f93: mov %r9b,0x17(%rdi,%rbx,1) ; ImmutableOopMap {rdi=Oop rax=Oop xmm0=Oop xmm3=Oop }
;*goto {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
0x00007f09c84c8f98: test %eax,(%rcx) ; {poll}
0x00007f09c84c8f9a: vmovd %xmm1,%r8d
0x00007f09c84c8f9f: dec %r8d
0x00007f09c84c8fa2: cmp %ebp,%r8d
0x00007f09c84c8fa5: jle 0x00007f09c84c8fb4
0x00007f09c84c8fa7: vmovd %r8d,%xmm1
0x00007f09c84c8fac: mov %r14d,%edi
0x00007f09c84c8faf: jmp 0x00007f09c84c8e36
0x00007f09c84c8fb4: test %r8d,%r8d
0x00007f09c84c8fb7: jle 0x00007f09c84c94e2
0x00007f09c84c8fbd: vmovd %xmm1,%r9d
0x00007f09c84c8fc2: dec %r9d
0x00007f09c84c8fc5: vmovd %r9d,%xmm2
0x00007f09c84c8fca: jmp 0x00007f09c84c8fd5
0x00007f09c84c8fcc: nopl 0x0(%rax)
0x00007f09c84c8fd0: vmovq %xmm1,%rax
0x00007f09c84c8fd5: lea (%r11,%r10,1),%edi
0x00007f09c84c8fd9: xor %r11d,%esi
0x00007f09c84c8fdc: lea 0x7(%r14),%ecx
0x00007f09c84c8fe0: rorx $0x13,%esi,%edx
0x00007f09c84c8fe6: movslq %ecx,%r9
0x00007f09c84c8fe9: mov %esi,%ebp
0x00007f09c84c8feb: shl $0x9,%ebp
0x00007f09c84c8fee: add $0xfffffffffffffff9,%r9
0x00007f09c84c8ff2: mov %edi,%r8d
0x00007f09c84c8ff5: shr $0x10,%r8d
0x00007f09c84c8ff9: xor %edi,%r8d
0x00007f09c84c8ffc: imul $0xadb4a92d,%r10d,%edi
0x00007f09c84c9003: add (%rsp),%edi
0x00007f09c84c9006: imul $0xd36d884b,%r8d,%r8d
0x00007f09c84c900d: imul $0xadb4a92d,%edi,%ebx
0x00007f09c84c9013: add (%rsp),%ebx
0x00007f09c84c9016: mov %ebx,0x10(%rax)
0x00007f09c84c9019: mov %r8d,%r10d
0x00007f09c84c901c: shr $0x10,%r10d
0x00007f09c84c9020: xor %r8d,%r10d
0x00007f09c84c9023: rorx $0x6,%r11d,%r11d
0x00007f09c84c9029: xor %esi,%r11d
0x00007f09c84c902c: xor %ebp,%r11d
0x00007f09c84c902f: add %r11d,%edi
0x00007f09c84c9032: xor %r11d,%edx
0x00007f09c84c9035: imul $0xd36d884b,%r10d,%ebp
0x00007f09c84c903c: rorx $0x13,%edx,%r8d
0x00007f09c84c9042: mov %r8d,0x18(%rax)
0x00007f09c84c9046: mov %ebp,%r10d
0x00007f09c84c9049: shr $0x10,%r10d
0x00007f09c84c904d: xor %ebp,%r10d
0x00007f09c84c9050: mov %edx,%r13d
0x00007f09c84c9053: shl $0x9,%r13d
0x00007f09c84c9057: movslq %r10d,%rsi
0x00007f09c84c905a: mov %edi,%ebp
0x00007f09c84c905c: shr $0x10,%ebp
0x00007f09c84c905f: xor %edi,%ebp
0x00007f09c84c9061: shl $0x20,%rsi
0x00007f09c84c9065: imul $0xd36d884b,%ebp,%r10d
0x00007f09c84c906c: rorx $0x6,%r11d,%ebp
0x00007f09c84c9072: xor %edx,%ebp
0x00007f09c84c9074: xor %r13d,%ebp
0x00007f09c84c9077: mov %ebp,0x14(%rax)
0x00007f09c84c907a: mov %r10d,%r11d
0x00007f09c84c907d: shr $0x10,%r11d
0x00007f09c84c9081: xor %r10d,%r11d
0x00007f09c84c9084: imul $0xd36d884b,%r11d,%r11d
0x00007f09c84c908b: mov %r11d,%r10d
0x00007f09c84c908e: shr $0x10,%r10d
0x00007f09c84c9092: xor %r11d,%r10d
0x00007f09c84c9095: movslq %r10d,%rdx ; {no_reloc}
0x00007f09c84c9098: xor %rsi,%rdx
0x00007f09c84c909b: cmp 0x10(%rsp),%r9
0x00007f09c84c90a0: jae 0x00007f09c84c9476
0x00007f09c84c90a6: cmp 0xc(%rsp),%ecx
0x00007f09c84c90aa: jae 0x00007f09c84c9490
0x00007f09c84c90b0: lea (%rbx,%rbp,1),%edi
0x00007f09c84c90b3: mov %ebp,%r10d
0x00007f09c84c90b6: xor %r8d,%r10d
0x00007f09c84c90b9: vmovd %xmm2,%r9d
0x00007f09c84c90be: dec %r9d
0x00007f09c84c90c1: rorx $0x13,%r10d,%r8d
0x00007f09c84c90c7: mov %r10d,%esi
0x00007f09c84c90ca: shl $0x9,%esi
0x00007f09c84c90cd: rorx $0x6,%ebp,%ebp
0x00007f09c84c90d3: xor %r10d,%ebp
0x00007f09c84c90d6: xor %esi,%ebp
0x00007f09c84c90d8: xor %ebp,%r8d
0x00007f09c84c90db: mov %edi,%r11d
0x00007f09c84c90de: shr $0x10,%r11d
0x00007f09c84c90e2: xor %edi,%r11d
0x00007f09c84c90e5: rorx $0x13,%r8d,%esi
0x00007f09c84c90eb: mov %esi,0x18(%rax)
0x00007f09c84c90ee: imul $0xd36d884b,%r11d,%r11d
0x00007f09c84c90f5: mov %r8d,%edi
0x00007f09c84c90f8: shl $0x9,%edi
0x00007f09c84c90fb: mov %r11d,%r10d
0x00007f09c84c90fe: shr $0x10,%r10d
0x00007f09c84c9102: xor %r11d,%r10d
0x00007f09c84c9105: rorx $0x6,%ebp,%r11d
0x00007f09c84c910b: xor %r8d,%r11d
0x00007f09c84c910e: xor %edi,%r11d
0x00007f09c84c9111: mov %r11d,0x14(%rax)
0x00007f09c84c9115: imul $0xd36d884b,%r10d,%edi
0x00007f09c84c911c: imul $0xadb4a92d,%ebx,%r10d
0x00007f09c84c9123: add (%rsp),%r10d
0x00007f09c84c9127: lea (%r10,%rbp,1),%ebx
0x00007f09c84c912b: mov %edi,%r8d
0x00007f09c84c912e: shr $0x10,%r8d
0x00007f09c84c9132: xor %edi,%r8d
0x00007f09c84c9135: mov %ebx,%ebp
0x00007f09c84c9137: shr $0x10,%ebp
0x00007f09c84c913a: xor %ebx,%ebp
0x00007f09c84c913c: movslq %r8d,%r8
0x00007f09c84c913f: imul $0xd36d884b,%ebp,%edi
0x00007f09c84c9145: shl $0x20,%r8
0x00007f09c84c9149: mov %edi,%ebp
0x00007f09c84c914b: shr $0x10,%ebp
0x00007f09c84c914e: xor %edi,%ebp
0x00007f09c84c9150: imul $0xadb4a92d,%r10d,%r10d
0x00007f09c84c9157: add (%rsp),%r10d
0x00007f09c84c915b: mov %r10d,0x10(%rax)
0x00007f09c84c915f: vmovq %rax,%xmm1
0x00007f09c84c9164: imul $0xd36d884b,%ebp,%eax
0x00007f09c84c916a: mov %edx,%ebx
0x00007f09c84c916c: vmovq %xmm3,%rdi
0x00007f09c84c9171: mov %bl,0x10(%rdi,%r14,1)
0x00007f09c84c9176: mov %eax,%edi
0x00007f09c84c9178: shr $0x10,%edi
0x00007f09c84c917b: xor %eax,%edi
0x00007f09c84c917d: shr $0x8,%rdx
0x00007f09c84c9181: movslq %edi,%rbx
0x00007f09c84c9184: xor %rbx,%r8
0x00007f09c84c9187: mov %edx,%edi
0x00007f09c84c9189: shr $0x8,%rdx
0x00007f09c84c918d: movslq %r14d,%rax
0x00007f09c84c9190: vmovq %xmm3,%rbx
0x00007f09c84c9195: mov %dil,0x11(%rbx,%rax,1) ; {no_reloc}
0x00007f09c84c919a: mov %edx,%ebx
0x00007f09c84c919c: vmovq %xmm3,%rdi
0x00007f09c84c91a1: mov %bl,0x12(%rdi,%rax,1)
0x00007f09c84c91a5: shr $0x8,%rdx
0x00007f09c84c91a9: mov %edx,%edi
0x00007f09c84c91ab: vmovq %xmm3,%rbx
0x00007f09c84c91b0: mov %dil,0x13(%rbx,%rax,1)
0x00007f09c84c91b5: lea 0x1(%rcx),%edi
0x00007f09c84c91b8: lea 0x8(%rcx),%r14d
0x00007f09c84c91bc: shr $0x8,%rdx
0x00007f09c84c91c0: movslq %r14d,%rbx
0x00007f09c84c91c3: mov %edx,%ebp
0x00007f09c84c91c5: vmovq %xmm3,%r13
0x00007f09c84c91ca: mov %bpl,0x14(%r13,%rax,1)
0x00007f09c84c91cf: add $0xfffffffffffffff9,%rbx
0x00007f09c84c91d3: shr $0x8,%rdx
0x00007f09c84c91d7: mov %rdx,%r13
0x00007f09c84c91da: shr $0x8,%r13
0x00007f09c84c91de: mov %edx,%ebp
0x00007f09c84c91e0: vmovq %xmm3,%rdx
0x00007f09c84c91e5: mov %bpl,0x15(%rdx,%rax,1)
0x00007f09c84c91ea: mov %r13d,%ebp
0x00007f09c84c91ed: mov %bpl,0x16(%rdx,%rax,1)
0x00007f09c84c91f2: shr $0x8,%r13
0x00007f09c84c91f6: mov %r13d,%ebp
0x00007f09c84c91f9: mov %bpl,0x17(%rdx,%rax,1)
0x00007f09c84c91fe: cmp 0x10(%rsp),%rbx
0x00007f09c84c9203: jae 0x00007f09c84c9483
0x00007f09c84c9209: cmp 0xc(%rsp),%r14d
0x00007f09c84c920e: jae 0x00007f09c84c949d
0x00007f09c84c9214: mov 0x458(%r15),%rbx
0x00007f09c84c921b: movslq %ecx,%rdx
0x00007f09c84c921e: mov %r8d,%edi
0x00007f09c84c9221: vmovq %xmm3,%rcx
0x00007f09c84c9226: mov %dil,0x11(%rcx,%rdx,1)
0x00007f09c84c922b: shr $0x8,%r8
0x00007f09c84c922f: mov %r8d,%ecx
0x00007f09c84c9232: vmovq %xmm3,%rdi
0x00007f09c84c9237: mov %cl,0x12(%rdi,%rdx,1)
0x00007f09c84c923b: inc %r14d
0x00007f09c84c923e: shr $0x8,%r8
0x00007f09c84c9242: mov %r8,%rdi
0x00007f09c84c9245: shr $0x8,%rdi
0x00007f09c84c9249: mov %r8d,%r8d
0x00007f09c84c924c: vmovq %xmm3,%rcx
0x00007f09c84c9251: mov %r8b,0x13(%rcx,%rdx,1)
0x00007f09c84c9256: mov %edi,%ecx
0x00007f09c84c9258: vmovq %xmm3,%r8
0x00007f09c84c925d: mov %cl,0x14(%r8,%rdx,1)
0x00007f09c84c9262: shr $0x8,%rdi
0x00007f09c84c9266: mov %rdi,%rax
0x00007f09c84c9269: shr $0x8,%rax
0x00007f09c84c926d: mov %edi,%r8d
0x00007f09c84c9270: vmovq %xmm3,%rcx
0x00007f09c84c9275: mov %r8b,0x15(%rcx,%rdx,1)
0x00007f09c84c927a: mov %eax,%ecx
0x00007f09c84c927c: vmovq %xmm3,%r8
0x00007f09c84c9281: mov %cl,0x16(%r8,%rdx,1)
0x00007f09c84c9286: shr $0x8,%rax
0x00007f09c84c928a: mov %rax,%rcx
0x00007f09c84c928d: shr $0x8,%rcx
0x00007f09c84c9291: mov %eax,%r8d
0x00007f09c84c9294: vmovq %xmm3,%rdi ; {no_reloc}
0x00007f09c84c9299: mov %r8b,0x17(%rdi,%rdx,1)
0x00007f09c84c929e: mov %ecx,%ecx
0x00007f09c84c92a0: mov %cl,0x18(%rdi,%rdx,1) ; ImmutableOopMap {rdi=Oop xmm0=Oop xmm1=Oop xmm3=Oop }
;*goto {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
0x00007f09c84c92a4: test %eax,(%rbx) ; {poll}
0x00007f09c84c92a6: vmovd %xmm2,%r9d
0x00007f09c84c92ab: add $0xfffffffe,%r9d
0x00007f09c84c92af: vmovd %r9d,%xmm2
0x00007f09c84c92b4: test %r9d,%r9d
0x00007f09c84c92b7: jg 0x00007f09c84c8fd0
0x00007f09c84c92bd: vmovq %xmm1,%rax
0x00007f09c84c92c2: vmovd %xmm2,%r8d
0x00007f09c84c92c7: cmp $0xffffffff,%r8d
0x00007f09c84c92cb: jle 0x00007f09c84c944d
0x00007f09c84c92d1: mov %r8d,%r13d
0x00007f09c84c92d4: lea (%r11,%r10,1),%r9d
0x00007f09c84c92d8: xor %r11d,%esi
0x00007f09c84c92db: lea 0x7(%r14),%edi
0x00007f09c84c92df: rorx $0x13,%esi,%ebx
0x00007f09c84c92e5: movslq %edi,%rdx
0x00007f09c84c92e8: mov %esi,%r8d
0x00007f09c84c92eb: shl $0x9,%r8d
0x00007f09c84c92ef: add $0xfffffffffffffff9,%rdx
0x00007f09c84c92f3: rorx $0x6,%r11d,%ebp
0x00007f09c84c92f9: xor %esi,%ebp
0x00007f09c84c92fb: xor %r8d,%ebp
0x00007f09c84c92fe: xor %ebp,%ebx
0x00007f09c84c9300: imul $0xadb4a92d,%r10d,%r11d
0x00007f09c84c9307: add (%rsp),%r11d
0x00007f09c84c930b: lea (%r11,%rbp,1),%ecx
0x00007f09c84c930f: rorx $0x13,%ebx,%esi
0x00007f09c84c9315: mov %esi,0x18(%rax)
0x00007f09c84c9318: mov %ecx,%r8d
0x00007f09c84c931b: shr $0x10,%r8d
0x00007f09c84c931f: xor %ecx,%r8d
0x00007f09c84c9322: imul $0xadb4a92d,%r11d,%r10d
0x00007f09c84c9329: add (%rsp),%r10d
0x00007f09c84c932d: mov %r10d,0x10(%rax)
0x00007f09c84c9331: imul $0xd36d884b,%r8d,%r11d
0x00007f09c84c9338: mov %ebx,%r8d
0x00007f09c84c933b: shl $0x9,%r8d
0x00007f09c84c933f: mov %r11d,%ecx
0x00007f09c84c9342: shr $0x10,%ecx
0x00007f09c84c9345: xor %r11d,%ecx
0x00007f09c84c9348: rorx $0x6,%ebp,%r11d
0x00007f09c84c934e: xor %ebx,%r11d
0x00007f09c84c9351: xor %r8d,%r11d
0x00007f09c84c9354: mov %r11d,0x14(%rax)
0x00007f09c84c9358: imul $0xd36d884b,%ecx,%r8d
0x00007f09c84c935f: mov %r9d,%ebx
0x00007f09c84c9362: shr $0x10,%ebx
0x00007f09c84c9365: xor %r9d,%ebx
0x00007f09c84c9368: mov %r8d,%r9d
0x00007f09c84c936b: shr $0x10,%r9d
0x00007f09c84c936f: xor %r8d,%r9d
0x00007f09c84c9372: imul $0xd36d884b,%ebx,%ebx
0x00007f09c84c9378: movslq %r9d,%r8
0x00007f09c84c937b: mov %ebx,%r9d
0x00007f09c84c937e: shr $0x10,%r9d
0x00007f09c84c9382: xor %ebx,%r9d
0x00007f09c84c9385: imul $0xd36d884b,%r9d,%ecx
0x00007f09c84c938c: mov %ecx,%r9d
0x00007f09c84c938f: shr $0x10,%r9d
0x00007f09c84c9393: xor %ecx,%r9d
0x00007f09c84c9396: movslq %r9d,%r9
0x00007f09c84c9399: shl $0x20,%r9
0x00007f09c84c939d: xor %r9,%r8
0x00007f09c84c93a0: cmp 0x10(%rsp),%rdx ; {no_reloc}
0x00007f09c84c93a5: jae 0x00007f09c84c94a8
0x00007f09c84c93ab: cmp 0xc(%rsp),%edi
0x00007f09c84c93af: jae 0x00007f09c84c94a8
0x00007f09c84c93b5: mov 0x458(%r15),%rcx
0x00007f09c84c93bc: mov %r8d,%r9d
0x00007f09c84c93bf: vmovq %xmm3,%rbx
0x00007f09c84c93c4: mov %r9b,0x10(%rbx,%r14,1)
0x00007f09c84c93c9: inc %edi
0x00007f09c84c93cb: shr $0x8,%r8
0x00007f09c84c93cf: movslq %r14d,%rdx
0x00007f09c84c93d2: mov %r8d,%ebx
0x00007f09c84c93d5: vmovq %xmm3,%r9
0x00007f09c84c93da: mov %bl,0x11(%r9,%rdx,1)
0x00007f09c84c93df: shr $0x8,%r8
0x00007f09c84c93e3: mov %r8,%rbx
0x00007f09c84c93e6: shr $0x8,%rbx
0x00007f09c84c93ea: mov %r8d,%r8d
0x00007f09c84c93ed: mov %r8b,0x12(%r9,%rdx,1)
0x00007f09c84c93f2: mov %ebx,%r9d
0x00007f09c84c93f5: vmovq %xmm3,%r8
0x00007f09c84c93fa: mov %r9b,0x13(%r8,%rdx,1)
0x00007f09c84c93ff: shr $0x8,%rbx
0x00007f09c84c9403: mov %rbx,%r9
0x00007f09c84c9406: shr $0x8,%r9
0x00007f09c84c940a: mov %ebx,%r8d
0x00007f09c84c940d: vmovq %xmm3,%rbx
0x00007f09c84c9412: mov %r8b,0x14(%rbx,%rdx,1)
0x00007f09c84c9417: mov %r9d,%r8d
0x00007f09c84c941a: mov %r8b,0x15(%rbx,%rdx,1)
0x00007f09c84c941f: shr $0x8,%r9
0x00007f09c84c9423: mov %r9,%r8
0x00007f09c84c9426: shr $0x8,%r8
0x00007f09c84c942a: mov %r9d,%r9d
0x00007f09c84c942d: mov %r9b,0x16(%rbx,%rdx,1)
0x00007f09c84c9432: mov %r8d,%r8d
0x00007f09c84c9435: mov %r8b,0x17(%rbx,%rdx,1) ; ImmutableOopMap {rbx=Oop rax=Oop xmm0=Oop xmm3=Oop }
;*goto {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@58 (line 488)
0x00007f09c84c943a: test %eax,(%rcx) ; {poll}
0x00007f09c84c943c: dec %r13d
0x00007f09c84c943f: cmp $0xffffffff,%r13d
0x00007f09c84c9443: jle 0x00007f09c84c9450
0x00007f09c84c9445: mov %edi,%r14d
0x00007f09c84c9448: jmp 0x00007f09c84c92d4
0x00007f09c84c944d: mov %r14d,%edi
0x00007f09c84c9450: vmovq %xmm3,%rdx
0x00007f09c84c9455: mov 0xc(%rsp),%r9d
0x00007f09c84c945a: cmp %r9d,%edi
0x00007f09c84c945d: jl 0x00007f09c84c9514
0x00007f09c84c9463: add $0x30,%rsp
0x00007f09c84c9467: pop %rbp
0x00007f09c84c9468: cmp 0x450(%r15),%rsp ; {poll_return}
0x00007f09c84c946f: ja 0x00007f09c84c954c
0x00007f09c84c9475: ret
0x00007f09c84c9476: mov %rdx,%r8
0x00007f09c84c9479: vmovd %xmm2,%r9d
0x00007f09c84c947e: mov %r14d,%edi
0x00007f09c84c9481: jmp 0x00007f09c84c9488
0x00007f09c84c9483: vmovq %xmm1,%rax
0x00007f09c84c9488: mov %r9d,%r13d
0x00007f09c84c948b: mov %edi,%r14d
0x00007f09c84c948e: jmp 0x00007f09c84c94a8
0x00007f09c84c9490: mov %rdx,%r8
0x00007f09c84c9493: vmovd %xmm2,%r9d
0x00007f09c84c9498: mov %r14d,%edi
0x00007f09c84c949b: jmp 0x00007f09c84c94a2
0x00007f09c84c949d: vmovq %xmm1,%rax
0x00007f09c84c94a2: mov %r9d,%r13d
0x00007f09c84c94a5: mov %edi,%r14d
0x00007f09c84c94a8: mov $0xffffff76,%esi
0x00007f09c84c94ad: mov %rax,%rbp
0x00007f09c84c94b0: vmovsd %xmm3,(%rsp)
0x00007f09c84c94b5: mov %r14d,0x8(%rsp)
0x00007f09c84c94ba: mov %r13d,0x10(%rsp)
0x00007f09c84c94bf: mov %r8,0x18(%rsp)
0x00007f09c84c94c4: data16 xchg %ax,%ax
0x00007f09c84c94c7: call 0x00007f09c7da9c00 ; ImmutableOopMap {rbp=Oop [0]=Oop }
;*ifle {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@35 (line 486)
; {runtime_call UncommonTrapBlob}
0x00007f09c84c94cc: nopl 0x30008bc(%rax,%rax,1) ; {other}
0x00007f09c84c94d4: vmovd %xmm1,%r9d
0x00007f09c84c94d9: jmp 0x00007f09c84c9488
0x00007f09c84c94db: vmovd %xmm1,%r9d
0x00007f09c84c94e0: jmp 0x00007f09c84c94a2
0x00007f09c84c94e2: vmovd %r8d,%xmm2
0x00007f09c84c94e7: jmp 0x00007f09c84c92c2
0x00007f09c84c94ec: mov $0xffffff76,%esi
0x00007f09c84c94f1: mov %rdx,(%rsp)
0x00007f09c84c94f5: mov %r9d,0x8(%rsp)
0x00007f09c84c94fa: mov %r8d,0xc(%rsp)
0x00007f09c84c94ff: vmovsd %xmm0,0x10(%rsp)
0x00007f09c84c9505: xchg %ax,%ax
0x00007f09c84c9507: call 0x00007f09c7da9c00 ; ImmutableOopMap {[0]=Oop [16]=Oop }
;*ifle {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@15 (line 484)
; {runtime_call UncommonTrapBlob}
0x00007f09c84c950c: nopl 0x40008fc(%rax,%rax,1) ; {other}
0x00007f09c84c9514: mov $0xffffff45,%esi
0x00007f09c84c9519: mov %rdx,%rbp
0x00007f09c84c951c: mov %edi,0x8(%rsp)
0x00007f09c84c9520: mov %r9d,0xc(%rsp)
0x00007f09c84c9525: vmovsd %xmm0,0x10(%rsp)
0x00007f09c84c952b: call 0x00007f09c7da9c00 ; ImmutableOopMap {rbp=Oop [16]=Oop }
;*if_icmpge {reexecute=1 rethrow=0 return_oop=0}
; - (reexecute) java.util.random.RandomGenerator::nextBytes@63 (line 489)
; {runtime_call UncommonTrapBlob}
0x00007f09c84c9530: nopl 0x5000920(%rax,%rax,1) ; {other}
0x00007f09c84c9538: mov $0xfffffff6,%esi
0x00007f09c84c953d: xchg %ax,%ax
0x00007f09c84c953f: call 0x00007f09c7da9c00 ; ImmutableOopMap {}
;*arraylength {reexecute=0 rethrow=0 return_oop=0}
; - java.util.random.RandomGenerator::nextBytes@3 (line 483)
; {runtime_call UncommonTrapBlob}
0x00007f09c84c9544: nopl 0x6000934(%rax,%rax,1) ; {other}
0x00007f09c84c954c: movabs $0x7f09c84c9468,%r10 ; {internal_word}
0x00007f09c84c9556: mov %r10,0x468(%r15)
0x00007f09c84c955d: jmp 0x00007f09c7daad00 ; {runtime_call SafepointBlob}
0x00007f09c84c9562: call Stub::nmethod_entry_barrier ; {runtime_call StubRoutines (final stubs)}
0x00007f09c84c9567: jmp 0x00007f09c84c8dda
0x00007f09c84c956c: hlt
0x00007f09c84c956d: hlt
0x00007f09c84c956e: hlt
0x00007f09c84c956f: hlt
[Exception Handler]
0x00007f09c84c9570: jmp 0x00007f09c7e6b100 ; {no_reloc}
[Deopt Handler Code]
0x00007f09c84c9575: call 0x00007f09c84c957a
0x00007f09c84c957a: subq $0x5,(%rsp)
0x00007f09c84c957f: jmp 0x00007f09c7da9fa0 ; {runtime_call DeoptimizationBlob}
0x00007f09c84c9584: hlt
0x00007f09c84c9585: hlt
0x00007f09c84c9586: hlt
0x00007f09c84c9587: hlt
--------------------------------------------------------------------------------
[/Disassembly]
(I'm not familiar with assembly) I guess loop unrolling is working?
I'm not sure too, but there's vmovd and vmovq which are moving double or quad words at once, so it appears vectorized. But using Unsafe/ByteArrayLittleEndian explicitly still seems better optimized from your results; I guess it might be because that you know the input random number size (int/long sizes). Can you try how putLongUnaligned etc. work, as VH implementation delegates to the unaligned versions (for plain get/set)?
Use Unsafe::putIntUnaligned
/Unsafe::putLongUnaligned
:
Results
Benchmark (length) Mode Cnt Score Error Units
RandomBenchmark.L32X64MixRandom 0 thrpt 5 1524860.609 ± 25736.732 ops/ms
RandomBenchmark.L32X64MixRandom 1 thrpt 5 215406.292 ± 2337.313 ops/ms
RandomBenchmark.L32X64MixRandom 2 thrpt 5 201345.579 ± 940.754 ops/ms
RandomBenchmark.L32X64MixRandom 3 thrpt 5 191355.115 ± 1805.205 ops/ms
RandomBenchmark.L32X64MixRandom 4 thrpt 5 184609.621 ± 1255.979 ops/ms
RandomBenchmark.L32X64MixRandom 5 thrpt 5 164468.663 ± 1677.559 ops/ms
RandomBenchmark.L32X64MixRandom 6 thrpt 5 156960.655 ± 663.464 ops/ms
RandomBenchmark.L32X64MixRandom 7 thrpt 5 153595.183 ± 3983.234 ops/ms
RandomBenchmark.L32X64MixRandom 8 thrpt 5 186632.617 ± 425.385 ops/ms
RandomBenchmark.L32X64MixRandom 10 thrpt 5 104736.408 ± 345.176 ops/ms
RandomBenchmark.L32X64MixRandom 12 thrpt 5 105447.874 ± 399.328 ops/ms
RandomBenchmark.L32X64MixRandom 14 thrpt 5 95664.265 ± 80.052 ops/ms
RandomBenchmark.L32X64MixRandom 16 thrpt 5 109343.697 ± 32.207 ops/ms
RandomBenchmark.L32X64MixRandom 32 thrpt 5 62252.931 ± 469.271 ops/ms
RandomBenchmark.L32X64MixRandom 64 thrpt 5 31358.265 ± 89.965 ops/ms
RandomBenchmark.L32X64MixRandom 128 thrpt 5 16607.450 ± 70.292 ops/ms
RandomBenchmark.L32X64MixRandom 256 thrpt 5 8327.905 ± 9.349 ops/ms
RandomBenchmark.L32X64MixRandom 512 thrpt 5 4379.807 ± 9.959 ops/ms
RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2169.190 ± 0.127 ops/ms
RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1081.397 ± 64.131 ops/ms
RandomBenchmark.L32X64MixRandom 4096 thrpt 5 546.185 ± 0.895 ops/ms
RandomBenchmark.L32X64MixRandom 8192 thrpt 5 273.206 ± 0.236 ops/ms
RandomBenchmark.Random 0 thrpt 5 1523782.776 ± 11592.739 ops/ms
RandomBenchmark.Random 1 thrpt 5 364587.781 ± 23904.474 ops/ms
RandomBenchmark.Random 2 thrpt 5 324850.835 ± 1698.265 ops/ms
RandomBenchmark.Random 3 thrpt 5 290855.010 ± 3524.691 ops/ms
RandomBenchmark.Random 4 thrpt 5 286867.826 ± 58.331 ops/ms
RandomBenchmark.Random 5 thrpt 5 151454.671 ± 525.393 ops/ms
RandomBenchmark.Random 6 thrpt 5 147070.562 ± 1477.003 ops/ms
RandomBenchmark.Random 7 thrpt 5 138053.754 ± 151.065 ops/ms
RandomBenchmark.Random 8 thrpt 5 154585.711 ± 1495.177 ops/ms
RandomBenchmark.Random 10 thrpt 5 92987.135 ± 1284.808 ops/ms
RandomBenchmark.Random 12 thrpt 5 102440.798 ± 204.633 ops/ms
RandomBenchmark.Random 14 thrpt 5 76235.547 ± 64.113 ops/ms
RandomBenchmark.Random 16 thrpt 5 77672.178 ± 28.365 ops/ms
RandomBenchmark.Random 32 thrpt 5 39193.225 ± 40.209 ops/ms
RandomBenchmark.Random 64 thrpt 5 19684.798 ± 7.152 ops/ms
RandomBenchmark.Random 128 thrpt 5 9884.926 ± 1.765 ops/ms
RandomBenchmark.Random 256 thrpt 5 4862.050 ± 1.655 ops/ms
RandomBenchmark.Random 512 thrpt 5 2457.171 ± 1.042 ops/ms
RandomBenchmark.Random 1024 thrpt 5 1228.285 ± 0.736 ops/ms
RandomBenchmark.Random 2048 thrpt 5 615.795 ± 0.977 ops/ms
RandomBenchmark.Random 4096 thrpt 5 311.657 ± 0.124 ops/ms
RandomBenchmark.Random 8192 thrpt 5 152.179 ± 0.031 ops/ms
Use ByteArrayLittleEndian
(#14636):
Results
``` Benchmark (length) Mode Cnt Score Error Units RandomBenchmark.L32X64MixRandom 0 thrpt 5 1528297.256 ± 11983.204 ops/ms RandomBenchmark.L32X64MixRandom 1 thrpt 5 215656.684 ± 1794.981 ops/ms RandomBenchmark.L32X64MixRandom 2 thrpt 5 201420.705 ± 1377.903 ops/ms RandomBenchmark.L32X64MixRandom 3 thrpt 5 190722.759 ± 3562.388 ops/ms RandomBenchmark.L32X64MixRandom 4 thrpt 5 184578.897 ± 587.992 ops/ms RandomBenchmark.L32X64MixRandom 5 thrpt 5 164248.972 ± 1153.358 ops/ms RandomBenchmark.L32X64MixRandom 6 thrpt 5 145869.045 ± 1342.215 ops/ms RandomBenchmark.L32X64MixRandom 7 thrpt 5 153291.149 ± 4666.694 ops/ms RandomBenchmark.L32X64MixRandom 8 thrpt 5 163664.923 ± 559.088 ops/ms RandomBenchmark.L32X64MixRandom 10 thrpt 5 101878.885 ± 322.857 ops/ms RandomBenchmark.L32X64MixRandom 12 thrpt 5 98918.245 ± 305.201 ops/ms RandomBenchmark.L32X64MixRandom 14 thrpt 5 95554.296 ± 253.037 ops/ms RandomBenchmark.L32X64MixRandom 16 thrpt 5 114686.083 ± 10.662 ops/ms RandomBenchmark.L32X64MixRandom 32 thrpt 5 54694.191 ± 77.666 ops/ms RandomBenchmark.L32X64MixRandom 64 thrpt 5 29272.233 ± 13.130 ops/ms RandomBenchmark.L32X64MixRandom 128 thrpt 5 15423.642 ± 13.856 ops/ms RandomBenchmark.L32X64MixRandom 256 thrpt 5 8007.269 ± 6.237 ops/ms RandomBenchmark.L32X64MixRandom 512 thrpt 5 4035.672 ± 1.192 ops/ms RandomBenchmark.L32X64MixRandom 1024 thrpt 5 2389.270 ± 1.732 ops/ms RandomBenchmark.L32X64MixRandom 2048 thrpt 5 1210.966 ± 0.645 ops/ms RandomBenchmark.L32X64MixRandom 4096 thrpt 5 609.226 ± 0.026 ops/ms RandomBenchmark.L32X64MixRandom 8192 thrpt 5 305.380 ± 0.147 ops/ms RandomBenchmark.Random 0 thrpt 5 1519068.332 ± 17554.468 ops/ms RandomBenchmark.Random 1 thrpt 5 349320.420 ± 50935.172 ops/ms RandomBenchmark.Random 2 thrpt 5 325239.890 ± 1852.854 ops/ms RandomBenchmark.Random 3 thrpt 5 293215.822 ± 5502.425 ops/ms RandomBenchmark.Random 4 thrpt 5 270030.002 ± 635.288 ops/ms RandomBenchmark.Random 5 thrpt 5 135824.338 ± 1411.090 ops/ms RandomBenchmark.Random 6 thrpt 5 131045.378 ± 131.826 ops/ms RandomBenchmark.Random 7 thrpt 5 123870.748 ± 281.168 ops/ms RandomBenchmark.Random 8 thrpt 5 159068.553 ± 577.367 ops/ms RandomBenchmark.Random 10 thrpt 5 97813.949 ± 133.771 ops/ms RandomBenchmark.Random 12 thrpt 5 104909.089 ± 54.468 ops/ms RandomBenchmark.Random 14 thrpt 5 75004.214 ± 237.386 ops/ms RandomBenchmark.Random 16 thrpt 5 78205.257 ± 91.166 ops/ms RandomBenchmark.Random 32 thrpt 5 39289.218 ± 24.475 ops/ms RandomBenchmark.Random 64 thrpt 5 19676.129 ± 8.671 ops/ms RandomBenchmark.Random 128 thrpt 5 9856.330 ± 1.669 ops/ms RandomBenchmark.Random 256 thrpt 5 4928.997 ± 1.652 ops/ms RandomBenchmark.Random 512 thrpt 5 2429.244 ± 2.227 ops/ms RandomBenchmark.Random 1024 thrpt 5 1239.338 ± 0.306 ops/ms RandomBenchmark.Random 2048 thrpt 5 619.758 ± 0.055 ops/ms RandomBenchmark.Random 4096 thrpt 5 274.033 ± 0.714 ops/ms RandomBenchmark.Random 8192 thrpt 5 151.607 ± 0.013 ops/ms ```The result seems interesting.
The new implementation of ByteArrayLittleEndian
in #14636 performs consistently with the old implementation using VarHandle
. (This conclusion gives me more confidence in #14636)
Interestingly, Unsafe::putIntUnaligned
/Unsafe::putLongUnaligned
is not always faster than the new implementation of ByteArrayLittleEndian
, even though it does not have additional bounds checking.
Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.
Can you publish your put_Unaligned code and the one with updated ByteArrayLittleEndian in two branches in your fork? I doubt something might be off in your code, and wish to test out on my end.
Use ByteArrayLittleEndian
: https://github.com/Glavo/jdk/tree/random-byte-array
Use putXxxUnaligned
: https://github.com/Glavo/jdk/tree/random-unaligned
This is my test server:
.-/+oossssoo+/-. glavo@minecraft-server
`:+ssssssssssssssssss+:` ----------------------
-+ssssssssssssssssssyyssss+- OS: Ubuntu 20.04.6 LTS x86_64
.ossssssssssssssssssdMMMNysssso. Kernel: 5.15.0-71-generic
/ssssssssssshdmmNNmmyNMMMMhssssss/ Uptime: 10 days, 2 hours, 42 mins
+ssssssssshmydMMMMMMMNddddyssssssss+ Packages: 2165 (dpkg), 13 (snap)
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ Shell: bash 5.0.17
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Terminal: /dev/pts/2
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ CPU: AMD Ryzen 7 5800X (16) @ 4.600GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso GPU: NVIDIA GeForce GT 710
ossyNMMMNyMMhsssssssssssssshmmmhssssssso Memory: 12570MiB / 32011MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
+sssssssssdmydMMMMMMMMddddyssssssss+
/ssssssssssshdmNNNNmyNMMMMhssssss/
.ossssssssssssssssssdMMMNysssso.
-+sssssssssssssssssyyyssss+-
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
I need to spend the day updating my server and upgrading some accessories tomorrow. If you are unable to replicate my previous JMH results, I will rerun all tests after upgrading the server.
@SirYwell
- I didn't find any proper tests that ensure that the behavior described in the Javadocs is actually maintained
I updated test/jdk/java/util/Random/NextBytes.java
to also test RandomGenerator::nextBytes
.
- I searched through usages of the nexBytes method on GitHub and mostly found a) usages of SecureRandom#nextBytes, which aren't affected by this, and b) usages with small arrays, where the effect isn't that huge.
I just did a quick search for nextBytes
inside the JDK. In fact, there are many use cases for Random::nextBytes
.
For example, in ZipEntryFreeTest
, it is used to fill ten arrays with the length of 2,000,000:
https://github.com/openjdk/jdk/blob/0db63ec76d451295e273c8e3272d013e2c3348ef/test/jdk/java/util/zip/ZipFile/ZipEntryFreeTest.java#L77-L87
It is widely used in unit testing to generate random test data. Optimizing it can help developers reduce the time spent running tests.
Running with benchmark in the patch: Using unsafe put:
Benchmark (algo) (length) Mode Cnt Score Error Units
RandomGeneratorNextBytes.testNextBytes Random 1 thrpt 12 139652.882 ± 1352.622 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 2 thrpt 12 140331.882 ± 1282.855 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 3 thrpt 12 139557.391 ± 1175.079 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 4 thrpt 12 138449.059 ± 1322.123 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 6 thrpt 12 71200.906 ± 951.863 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 7 thrpt 12 72158.561 ± 334.151 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 9 thrpt 12 48154.027 ± 209.683 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 10 thrpt 12 44386.601 ± 7802.001 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 48 thrpt 12 11711.709 ± 51.540 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 512 thrpt 12 984.135 ± 114.699 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 1000 thrpt 12 499.867 ± 75.867 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1 thrpt 12 291444.618 ± 908.968 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 2 thrpt 12 279086.952 ± 785.740 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 3 thrpt 12 272168.427 ± 1179.134 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 4 thrpt 12 225741.559 ± 108229.595 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 6 thrpt 12 93584.669 ± 4203.102 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 7 thrpt 12 94964.676 ± 14241.917 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 9 thrpt 12 145814.460 ± 464.698 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 10 thrpt 12 142188.443 ± 753.706 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 48 thrpt 12 55356.994 ± 142.517 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 512 thrpt 12 5963.217 ± 33.529 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1000 thrpt 12 2964.744 ± 23.344 ops/ms
Using bytearray:
Benchmark (algo) (length) Mode Cnt Score Error Units
RandomGeneratorNextBytes.testNextBytes Random 1 thrpt 12 139322.101 ± 1176.140 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 2 thrpt 12 99385.014 ± 38348.493 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 3 thrpt 12 81765.495 ± 2291.560 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 4 thrpt 12 89411.431 ± 31054.806 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 6 thrpt 12 42040.396 ± 3441.116 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 7 thrpt 12 38358.942 ± 2379.015 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 9 thrpt 12 31104.518 ± 1299.442 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 10 thrpt 12 28871.366 ± 1634.108 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 48 thrpt 12 8907.501 ± 506.556 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 512 thrpt 12 871.237 ± 80.083 ops/ms
RandomGeneratorNextBytes.testNextBytes Random 1000 thrpt 12 432.025 ± 28.046 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1 thrpt 12 120484.263 ± 4843.672 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 2 thrpt 12 107474.776 ± 4915.841 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 3 thrpt 12 104832.882 ± 9039.199 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 4 thrpt 12 106411.957 ± 4440.441 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 6 thrpt 12 97407.747 ± 13756.916 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 7 thrpt 12 87383.519 ± 3631.554 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 9 thrpt 12 53060.202 ± 2519.723 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 10 thrpt 12 48562.023 ± 3260.066 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 48 thrpt 12 23700.122 ± 1973.472 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 512 thrpt 12 2635.817 ± 129.181 ops/ms
RandomGeneratorNextBytes.testNextBytes L32X64MixRandom 1000 thrpt 12 1415.250 ± 55.821 ops/ms
The benchmark results are somewhat weird: L32X64MixRandom has drastically different results even for small sizes that don't involve multi-byte writes.
The benchmark results are somewhat weird: L32X64MixRandom has drastically different results even for small sizes that don't involve multi-byte writes.
These are unbelievable results.
In your results, even for small byte arrays (length < 8), ByteArrayLittleEndian
is much slower than Unsafe
. This is very strange because Unsafe
or ByteArrayLittleEndian
are not actually called for small byte arrays.
On a side note, I think you can become an author in the JDK project: https://openjdk.org/guide/#becoming-an-author
@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.
/open
@Glavo This pull request is now open
@Glavo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@Glavo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.
/open