jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8282365: Consolidate and improve division by constant idealizations

Open merykitty opened this issue 2 years ago • 73 comments

This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of DivI to get a more efficient series when the magic constant overflows an int32.

In general, the idea behind a signed division transformation is that for a positive constant d, we would need to find constants c and m so that:

floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1)
ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2)

The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication x * c, so the constant overflow and we need to add back the dividend as in DivLNode::Ideal cases. However, for int32 division, x * c cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result.

For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant c calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either:

c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3)
c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4)

which means that either x * c1 never overflows an uint64 or (x + 1) * c2 never overflows an uint64. And we can perform a full multiplication.

For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case.

More tests are added to cover the possible patterns.

Please take a look and have some reviews. Thank you very much.


Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [x] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8282365: Consolidate and improve division by constant idealizations (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947
$ git checkout pull/9947

Update a local copy of the PR:
$ git checkout pull/9947
$ git pull https://git.openjdk.org/jdk.git pull/9947/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9947

View PR using the GUI difftool:
$ git pr show -t 9947

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9947.diff

Webrev

Link to Webrev Comment

merykitty avatar Aug 19 '22 17:08 merykitty

:wave: Welcome back merykitty! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Aug 19 '22 17:08 bridgekeeper[bot]

/label hotspot-compiler

merykitty avatar Aug 19 '22 18:08 merykitty

@merykitty The hotspot-compiler label was successfully added.

openjdk[bot] avatar Aug 19 '22 19:08 openjdk[bot]

Webrevs

mlbridge[bot] avatar Aug 19 '22 20:08 mlbridge[bot]

Great work @merykitty! Could you pls also list the performance improvement due to the changes you made?

vamsi-parasa avatar Aug 22 '22 01:08 vamsi-parasa

@merykitty this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout unsignedDiv
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk[bot] avatar Aug 23 '22 01:08 openjdk[bot]

The updated benchmark shows improvements in unsigned divisions by a constant, also signed divisions where the magic constant lies outside the limit of an i32 also show improvements in comparison with the baseline:

                                                                              Before              After
Benchmark                                    (BUFFER_SIZE)  Mode  Cnt     Score     Error     Score    Error  Units
IntegerDivMod.testDivideConstant                      1024  avgt   15  1017.464 ±  43.674   899.816 ± 14.511  ns/op
IntegerDivMod.testDivideUnsignedConstant              1024  avgt   15  2125.401 ±  54.692   762.749 ± 17.280  ns/op
IntegerDivMod.testRemainderUnsignedConstant           1024  avgt   15  2155.461 ±  60.759   918.103 ± 10.772  ns/op
LongDivMod.testDivideUnsignedConstant                 1024  avgt   15  6668.891 ±  81.072   949.033 ± 15.485  ns/op
LongDivMod.testRemainderUnsignedConstant              1024  avgt   15  7275.035 ± 135.520  1355.340 ± 17.853  ns/op

merykitty avatar Aug 23 '22 01:08 merykitty

I also asked to have separate positive and negative divisor values in JMH tests. In addition to mixed ones.

vnkozlov avatar Sep 27 '22 20:09 vnkozlov

@vnkozlov I have addressed your reviews in the last commit. Thanks very much.

merykitty avatar Oct 03 '22 16:10 merykitty

/reviewers 2

vnkozlov avatar Oct 04 '22 18:10 vnkozlov

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8282365: Consolidate and improve division by constant idealizations

Reviewed-by: kvn, epeter, kbarrett

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Oct 04 '22 18:10 openjdk[bot]

@vnkozlov The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

openjdk[bot] avatar Oct 04 '22 18:10 openjdk[bot]

Thanks for your review.

merykitty avatar Oct 05 '22 00:10 merykitty

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Nov 03 '22 07:11 bridgekeeper[bot]

I have followed @rose00 's suggestion by refactoring the magic constant calculation into a separate header and creating test cases for these functions. Please take a look and leave reviews. Thanks a lot.

merykitty avatar Nov 29 '22 14:11 merykitty

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jan 09 '23 22:01 bridgekeeper[bot]

@merykitty This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar Feb 07 '23 04:02 bridgekeeper[bot]

/open

merykitty avatar Mar 27 '23 02:03 merykitty

@merykitty This pull request is now open

openjdk[bot] avatar Mar 27 '23 02:03 openjdk[bot]

Sorry for the late updates, I have updated the gtest to emulate the transformation done in DivNode::Ideals and verify the results with numerous input values. This uncover a bug that a shift value too large warps back to 0, which gives incorrect results. As a result, I have added multiple asserts to ensure the shift does not overflow when it should not and idealisations to 0 in the cases it does.

Thanks a lot.

merykitty avatar Mar 27 '23 02:03 merykitty

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Apr 26 '23 19:04 bridgekeeper[bot]

@merykitty This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar May 25 '23 02:05 bridgekeeper[bot]

/open

merykitty avatar Jun 10 '23 01:06 merykitty

@merykitty This pull request is now open

openjdk[bot] avatar Jun 10 '23 01:06 openjdk[bot]

May I have a second review for this patch, please?

merykitty avatar Jun 10 '23 01:06 merykitty

@merykitty I just discussed the testing with @TobiHartmann . He just came across this test: test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare1.java. The cool thing is that you can "simulate" constants with MethodHandles.constant. At runtime apparently the invocation specualte-and-traps it to a constant value. That means you can just set a new value, it depopts, and hopefully eventually re-compiles with the next constants.

You could easily set up one of these tests per node. Any maybe throw in some interesting ranges for the dividend.

An interesting experiment would be to have a IR test that works with a random constant, and then have an IR rule that fails if we find adiv node. At least for those cases where that should work. And then you can easily compare the div results with a non-compiled method that computes the same value.

eme64 avatar Jul 03 '23 13:07 eme64

@eme64 Thanks a lot for taking a look at this patch, I will address your remaining comments soon.

The basic idea of the transformation in javaArithmetic.hpp is to find M and s such that x / c = floor(x * M / 2**s) for every interesting value of x. The remaining transformation in divnode.cpp is to convert this calculation from integer arithmetic to modular arithmetic. This is easy if the representative in the congruence class of an operand is always equal to itself, in which case we can do the calculation directly. For other cases, we have to do additional calculation to take into consideration the difference between arithmetic calculations in 2 domains.

merykitty avatar Jul 12 '23 17:07 merykitty

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Aug 10 '23 00:08 bridgekeeper[bot]

@merykitty I'm mostly out of the office until September 9 (FYI).

It would be really cool if this made it in. I'm currently playing with MethodHandles.constant, and it is really easy to have "random" compile time constants.

eme64 avatar Aug 10 '23 11:08 eme64

@eme64 Thanks a lot for your support and sorry for the delay, I came across this article which motivated me to come up with a more general and optimal algorithm. This also consolidates the magic constant calculation across different division operations.

The underlying theory is given in the function magic_divide_constant, which I will restate here. Given positive integers d <= N, call v the largest nonnegative integer not larger than N such that v + 1 is divisible by d then:

For all nonnegative integers c, m such that:

m <= c * d < m + m / v

We have:

floor(x / d) = floor(x * c / m) for all integers x in [0, N] (1)

For all nonnegative integers c, m such that:

m < c * d <= m + m / v

We have:

ceil(x / d) = floor(x * c / m) + 1 for all integers x in [-N, 0) (2)

As a result, to calculate the constant c, m corresponding to a division x / d with x in [lo, hi], we divide the dividend range into negative and nonnegative intervals [lo, 0) and [0, hi]. Then, call v_neg the largest integer not larger than -lo such that v_neg + 1 is divisible by d, and v_pos the largest integer not larger than hi such that v_pos + 1 is divisible by d. We then need to find constant c, m such that:

m <= c * d < m + m / v_pos
m < c * d <= m + m / v_neg

This is applicable for both signed and unsigned types (with unsigned types we do not need to consider the negative range).

Substitute x = v, x = v - d + 1 into (1), as well as x = -v, x = -v + d - 1 into (2) shows that these bounds are indeed optimal.

merykitty avatar Aug 19 '23 08:08 merykitty