jdk
jdk copied to clipboard
8282365: Consolidate and improve division by constant idealizations
This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of DivI
to get a more efficient series when the magic constant overflows an int32.
In general, the idea behind a signed division transformation is that for a positive constant d
, we would need to find constants c
and m
so that:
floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)
The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication x * c
, so the constant overflow and we need to add back the dividend as in DivLNode::Ideal
cases. However, for int32 division, x * c
cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.
For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant c
calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:
c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)
which means that either x * c1
never overflows an uint64 or (x + 1) * c2
never overflows an uint64. And we can perform a full multiplication.
For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.
More tests are added to cover the possible patterns.
Please take a look and have some reviews. Thank you very much.
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [x] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)
Issue
- JDK-8282365: Consolidate and improve division by constant idealizations (Enhancement - P4)
Reviewers
- Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to 297d2792
- Emanuel Peter (@eme64 - Reviewer) ⚠️ Review applies to fd01489f
- Kim Barrett (@kimbarrett - Reviewer) ⚠️ Review applies to ed0ca1c3
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947
$ git checkout pull/9947
Update a local copy of the PR:
$ git checkout pull/9947
$ git pull https://git.openjdk.org/jdk.git pull/9947/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9947
View PR using the GUI difftool:
$ git pr show -t 9947
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9947.diff
Webrev
:wave: Welcome back merykitty! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
/label hotspot-compiler
@merykitty
The hotspot-compiler
label was successfully added.
Webrevs
- 49: Full (2a31e27a)
- 48: Full - Incremental (3068d7e5)
- 47: Full (ed0ca1c3)
- 46: Full - Incremental (26e5c6e7)
- 45: Full - Incremental (1400de7b)
- 44: Full - Incremental (6634cd46)
- 43: Full - Incremental (db80bd4a)
- 42: Full - Incremental (bba52b74)
- 41: Full - Incremental (0f2c57c7)
- 40: Full - Incremental (57265bbd)
- 39: Full (05981133)
- 38: Full - Incremental (b56dc2d4)
- 37: Full - Incremental (a8389af1)
- 36: Full - Incremental (567eed97)
- 35: Full - Incremental (75a2c172)
- 34: Full - Incremental (e8b54dad)
- 33: Full (501494a1)
- 32: Full (529bd0f9)
- 31: Full - Incremental (b113e605)
- 30: Full - Incremental (297d2792)
- 29: Full - Incremental (fd01489f)
- 28: Full - Incremental (f63a0816)
- 27: Full - Incremental (42662b36)
- 26: Full - Incremental (deece1aa)
- 25: Full - Incremental (6004299b)
- 24: Full - Incremental (455e9a93)
- 23: Full - Incremental (059a01b3)
- 22: Full (27dfbbae)
- 21: Full (62943d1f)
- 20: Full - Incremental (fac857c6)
- 19: Full - Incremental (07343562)
- 18: Full - Incremental (6f8eea31)
- 17: Full - Incremental (1ae865f0)
- 16: Full (c48d96be)
- 15: Full (eb1f5dd9)
- 14: Full - Incremental (f2086507)
- 13: Full (e44625d6)
- 12: Full - Incremental (9b0f730a)
- 11: Full - Incremental (4b359d3a)
- 10: Full - Incremental (afe37a99)
- 09: Full (3c03f89b)
- 08: Full - Incremental (058c2ec5)
- 07: Full - Incremental (1ad99969)
- 06: Full - Incremental (0cb30b8d)
- 05: Full - Incremental (53c07784)
- 04: Full - Incremental (fccfe7ec)
- 03: Full (156f65c0)
- 02: Full (3052b4ee)
- 01: Full - Incremental (1551392c)
- 00: Full (52309196)
Great work @merykitty! Could you pls also list the performance improvement due to the changes you made?
@merykitty this pull request can not be integrated into master
due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout unsignedDiv
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
The updated benchmark shows improvements in unsigned divisions by a constant, also signed divisions where the magic constant lies outside the limit of an i32 also show improvements in comparison with the baseline:
Before After
Benchmark (BUFFER_SIZE) Mode Cnt Score Error Score Error Units
IntegerDivMod.testDivideConstant 1024 avgt 15 1017.464 ± 43.674 899.816 ± 14.511 ns/op
IntegerDivMod.testDivideUnsignedConstant 1024 avgt 15 2125.401 ± 54.692 762.749 ± 17.280 ns/op
IntegerDivMod.testRemainderUnsignedConstant 1024 avgt 15 2155.461 ± 60.759 918.103 ± 10.772 ns/op
LongDivMod.testDivideUnsignedConstant 1024 avgt 15 6668.891 ± 81.072 949.033 ± 15.485 ns/op
LongDivMod.testRemainderUnsignedConstant 1024 avgt 15 7275.035 ± 135.520 1355.340 ± 17.853 ns/op
I also asked to have separate positive and negative divisor values in JMH tests. In addition to mixed ones.
@vnkozlov I have addressed your reviews in the last commit. Thanks very much.
/reviewers 2
@merykitty This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8282365: Consolidate and improve division by constant idealizations
Reviewed-by: kvn, epeter, kbarrett
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been no new commits pushed to the master
branch. If another commit should be pushed before you perform the /integrate
command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master
branch, type /integrate in a new comment.
@vnkozlov The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).
Thanks for your review.
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
I have followed @rose00 's suggestion by refactoring the magic constant calculation into a separate header and creating test cases for these functions. Please take a look and leave reviews. Thanks a lot.
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@merykitty This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.
/open
@merykitty This pull request is now open
Sorry for the late updates, I have updated the gtest to emulate the transformation done in DivNode::Ideal
s and verify the results with numerous input values. This uncover a bug that a shift value too large warps back to 0, which gives incorrect results. As a result, I have added multiple asserts to ensure the shift does not overflow when it should not and idealisations to 0
in the cases it does.
Thanks a lot.
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@merykitty This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.
/open
@merykitty This pull request is now open
May I have a second review for this patch, please?
@merykitty I just discussed the testing with @TobiHartmann . He just came across this test:
test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare1.java
.
The cool thing is that you can "simulate" constants with MethodHandles.constant
. At runtime apparently the invocation specualte-and-traps it to a constant value. That means you can just set a new value, it depopts, and hopefully eventually re-compiles with the next constants.
You could easily set up one of these tests per node. Any maybe throw in some interesting ranges for the dividend
.
An interesting experiment would be to have a IR test that works with a random constant, and then have an IR rule that fails if we find adiv
node. At least for those cases where that should work. And then you can easily compare the div results with a non-compiled method that computes the same value.
@eme64 Thanks a lot for taking a look at this patch, I will address your remaining comments soon.
The basic idea of the transformation in javaArithmetic.hpp
is to find M
and s
such that x / c = floor(x * M / 2**s)
for every interesting value of x
. The remaining transformation in divnode.cpp
is to convert this calculation from integer arithmetic to modular arithmetic. This is easy if the representative in the congruence class of an operand is always equal to itself, in which case we can do the calculation directly. For other cases, we have to do additional calculation to take into consideration the difference between arithmetic calculations in 2 domains.
@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@merykitty I'm mostly out of the office until September 9 (FYI).
It would be really cool if this made it in. I'm currently playing with MethodHandles.constant
, and it is really easy to have "random" compile time constants.
@eme64 Thanks a lot for your support and sorry for the delay, I came across this article which motivated me to come up with a more general and optimal algorithm. This also consolidates the magic constant calculation across different division operations.
The underlying theory is given in the function magic_divide_constant
, which I will restate here. Given positive integers d <= N
, call v
the largest nonnegative integer not larger than N
such that v + 1
is divisible by d
then:
For all nonnegative integers c
, m
such that:
m <= c * d < m + m / v
We have:
floor(x / d) = floor(x * c / m) for all integers x in [0, N] (1)
For all nonnegative integers c
, m
such that:
m < c * d <= m + m / v
We have:
ceil(x / d) = floor(x * c / m) + 1 for all integers x in [-N, 0) (2)
As a result, to calculate the constant c
, m
corresponding to a division x / d
with x
in [lo, hi]
, we divide the dividend range into negative and nonnegative intervals [lo, 0)
and [0, hi]
. Then, call v_neg
the largest integer not larger than -lo
such that v_neg + 1
is divisible by d
, and v_pos
the largest integer not larger than hi
such that v_pos + 1
is divisible by d
. We then need to find constant c
, m
such that:
m <= c * d < m + m / v_pos
m < c * d <= m + m / v_neg
This is applicable for both signed and unsigned types (with unsigned types we do not need to consider the negative range).
Substitute x = v, x = v - d + 1
into (1), as well as x = -v, x = -v + d - 1
into (2) shows that these bounds are indeed optimal.