jdk
jdk copied to clipboard
8289943: Simplify some object allocation merges
Hi there, can I please get some feedback on this approach to simplify object allocation merges in order to promote Scalar Replacement of the objects involved in the merge?
The basic idea for this approach was discussed in this thread and it consists of:
- Identify Phi nodes that merge object allocations and replace them with a new IR node called ReducedAllocationMergeNode (RAM node).
- Scalar Replace the incoming allocations to the RAM node.
- Scalar Replace the RAM node itself.
There are a few conditions for doing the replacement of the Phi by a RAM node though - Although I plan to work on removing them in subsequent PRs:
- The only supported users of the original Phi are AddP->Load, SafePoints/Traps, DecodeN.
These are the critical parts of the implementation and I'd appreciate it very much if you could tell me if what I implemented isn't violating any C2 IR constraints:
- The way I identify/use the memory edges that will be used to find the last stored values to the merged object fields.
- The way I check if there is an incoming Allocate node to the original Phi node.
- The way I check if there is no store to the merged objects after they are merged.
Testing:
- Windows/Linux/MAC fastdebug/release
- hotspot_all
- tier1
- Renaissance
- dacapo
- new IR-based tests
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)
Issue
- JDK-8289943: Simplify some object allocation merges
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9073/head:pull/9073
$ git checkout pull/9073
Update a local copy of the PR:
$ git checkout pull/9073
$ git pull https://git.openjdk.org/jdk pull/9073/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9073
View PR using the GUI difftool:
$ git pr show -t 9073
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9073.diff
:wave: Welcome back JohnTortugo! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@JohnTortugo this pull request can not be integrated into master
due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout allocation-merges
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
⚠️ @JohnTortugo This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch>
where <project>
is the name of another project in the OpenJDK organization (for example Merge jdk:master
).
@JohnTortugo The following label will be automatically applied to this pull request:
-
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Hi there, can someone please take a look and let me know if this is going in a reasonable direction?
@vnkozlov - is this more or less on the lines of what you were thinking?
Webrevs
- 13: Full (d26909fa)
- 12: Full - Incremental (9e7163a3)
- 11: Full - Incremental (a03b91a7)
- 10: Full - Incremental (203b40b1)
- 09: Full - Incremental (abd474e6)
- 08: Full (c7d00d0b)
- 07: Full - Incremental (db0ae5fd)
- 06: Full - Incremental (2d444b10)
- 05: Full - Incremental (e68c7926)
- 04: Full - Incremental (aebb027d)
- 03: Full (30314766)
- 02: Full (9d1eed25)
- 01: Full - Incremental (8f802f76)
- 00: Full (e1d506f3)
This is good starting point. To have new "Phi" type node to collect information about merged allocation. I need more time to dive into changes to give review.
Thanks for taking the time to look into this!
I currently found one issue we need discuss - merge allocation of different subclasses (of the same parent class) which may have different number of fields. Current implementation assume objects are the same but I don't see the check for it during RAM node creation. May be we should have it at this initial implementation.
Got it. I'll create some tests for this and see what happens.
What about adjust_scalar_replaceable_state() code mark allocation as non-SR if they are merged?
I didn't get this part. Can you please clarify?
About input memory slices. Since merged allocation are SR we should have some new memory Phi created in EA split_memory_phi() which we can try to identify instead of adding all memory slices we find (I am talking about RAM constructor).
I'll take a look into that. Thanks for the suggestion.
I see you bailed compilation to recompile in case you can't remove RAM node. I think it is fine for initial implementation.
Great.
What about adjust_scalar_replaceable_state() code mark allocation as non-SR if they are merged?
I didn't get this part. Can you please clarify?
My bad. After looking more on changes I noticed that you exit compute_escape()
before adjust_scalar_replaceable_state()
is called. So my comment is null.
After looking more on changes I noticed that you exit compute_escape() before adjust_scalar_replaceable_state() is called.
No worries. I'm currently working to make reduce_allocation_merges
be executed as part of compute_escape so that we can take benefit from iterative EA executions.
I run into an error
$make test TEST=compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java CONF=linux-x86_64-server-fastdebug
One or more @IR rules failed:
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesTests.testPollutedPolymorphic(boolean,int)" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(applyIf={}, applyIfAnd={}, failOn={}, applyIfOr={}, counts={"(.*precise .*\\R((.*(?i:mov|xorl|nop|spill).*|\\s*|.*LGHI.*)\\R)*.*(?i:call,static).*wrapper for: _new_instance_Java)", "2"}, applyIfNot={})"
- counts: Graph contains wrong number of nodes:
* Regex 1: (.*precise .*\R((.*(?i:mov|xorl|nop|spill).*|\s*|.*LGHI.*)\R)*.*(?i:call,static).*wrapper for: _new_instance_Java)
- Failed comparison: [found] 0 = 2 [given]
- No nodes matched!
>>> Check stdout for compilation output of the failed methods
#############################################################
- To only run the failed tests use -DTest, -DExclude,
and/or -DScenarios.
- To also get the standard output of the test VM run with
-DReportStdout=true or for even more fine-grained logging
use -DVerbose=true.
#############################################################
compiler.lib.ir_framework.driver.irmatching.IRViolationException: There were one or multiple IR rule failures. Please check stderr for more information.
at compiler.lib.ir_framework.driver.irmatching.IRMatcher.throwIfNoSafepointWhilePrinting(IRMatcher.java:91)
at compiler.lib.ir_framework.driver.irmatching.IRMatcher.reportFailures(IRMatcher.java:82)
at compiler.lib.ir_framework.driver.irmatching.IRMatcher.applyIRRules(IRMatcher.java:54)
at compiler.lib.ir_framework.driver.irmatching.IRMatcher.<init>(IRMatcher.java:43)
at compiler.lib.ir_framework.TestFramework.runTestVM(TestFramework.java:729)
at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:698)
at compiler.lib.ir_framework.TestFramework.start(TestFramework.java:329)
at compiler.lib.ir_framework.TestFramework.runWithFlags(TestFramework.java:237)
at compiler.c2.irTests.scalarReplacement.AllocationMergesTests.main(AllocationMergesTests.java:37)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
at java.base/java.lang.Thread.run(Thread.java:1596)
2 objects do get scalarized in testPollutedPolymorphic
.
Thanks for taking a look @navyxliu . I'm working on an improvement (and fixing a bug) and I'll include your suggestions in the next push.
@JohnTortugo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
Hi @vnkozlov , @navyxliu - I think I addressed all your comments so far. Can I ask you to please take a look again? I made several modifications for fixing bugs and improving the coverage of the optimization. In this new version the simplification of allocations is performed in every EA iteration, the memory edges in RAM are updated after split_unique_types is computed and only allocations merged inputs of the same _klass
are simplified.
I tested this in several benchmarks and didn't find any regression. Nevertheless, I'm still going to run more tests with different JVM configurations.
Can I ask you (@TobiHartmann , @rwestrel , @iwanowww) to also take a look?
I currently found one issue we need discuss - merge allocation of different subclasses (of the same parent class) which may have different number of fields. Current implementation assume objects are the same but I don't see the check for it during RAM node creation. May be we should have it at this initial implementation.
Hi again @vnkozlov - The current implementation requires the Klass of all inputs to the Phi to be the same. I'm trying to lift that restriction as it would increase the number of cases where the merge can be removed. I've been trying to create an example where I'd need to enforce the allocations to be of the same type but I couldn't figure one out yet. Do you have an example where that must be enforced? Note that I only need fields accessed through the merge Phi node, which means only fields present in the superclass AFAIU.
Hi @JohnTortugo Yes, I think you can lift restriction for all inputs have the same class. I looked more and you correctly set RAM's type to original Phi's type - it should be correct type (base class, which could be also one of inputs). You correctly register fields offsets based on this (Phi) class during RAM make. C2 type system and bytecode should guarantee that Phi's type is correct (meet of all inputs). It could be more narrowed during CCP phase but type you got serves correctly for this implementation. May be assert that it is not interface - or skip such cases if you find them.
You can add assert(input_t->is_instptr()->instance_klass()->is_subtype_of(klass))
instead of klass equals check.
What confused me is code in PhaseMacroExpand::scalar_replacement()
where you process Allocation from one of inputs which may have more fields than RAM registered. But you correctly have !ram->needs_field(offset)
check which I missed before. That why I commented.
Allocations in testPollutedPolymorphic()
are removed because both classes have the same Shape
class which have all fields. Would be interesting if l
field is declared only in both subclasses.
@vnkozlov - Thank you for clarifying that. I've been playing with lifting the restriction and I actually found a corner case:
public static Class test(boolean c1, boolean c2, boolean c3, int x, int y, int w, int z) {
Animal s = new Dog(x, y, z);
if (c1) {
s = new Cat("Fisker");
}
Unloaded u = new Unloaded(); // assumes this is converted to a uncommon_trap(unloaded, reinterpret)
return s.getClass();
}
It seems that when merging allocations of different subtypes I'll need to add a special Phi
node merging the Klass
of the input allocations and assign the output of that Phi
to SafepointScalarReplacedNode
. If I don't do that, the method above will return Animal.class instead of Dog.class
or Cat.class
. I'm wondering if I'll actually have to do the same for the Header/Mark word of the input allocations.
@JohnTortugo Yes, you would need to construct Phi when you replace RAM with SafePointScalarObjectNode
. Hmm, may be you would need to construct Phi in other cases too (getClass intrinsic).
Add cases when class is loaded from argument for allocation: Unsafe.allocateInstance()
and Object.clone()
to test class Load instead of simple constant on some paths of such Phi.
Why you need Phi for mark word? For identity check (hashCode/identityHashCode intrinsics)?
@JohnTortugo Yes, you would need to construct Phi when you replace RAM with SafePointScalarObjectNode. Hmm, may be you would need to construct Phi in other cases too (getClass intrinsic).
Yes. I'll take a look into getClass
intrinsic. I thought that just adding input to SafePointScalarObjectNode
+Safepoint with a Phi of the input allocations Klass fields would be enough for the code to correctly access fields/methods of the input objects. However, it looks like the Klass of the object serialized/deserialized from SafePointScalarObjectNode is the type of SafePointScalarObjectNode
, not the Klass field input edge set in the Safepoint. In other words, the object represented in SafePointScalarObjectNode needs to be exact AFAIU.
Add cases when class is loaded from argument for allocation: Unsafe.allocateInstance() and Object.clone() to test class Load instead of simple constant on some paths of such Phi.
I'll do that. Thanks for the tip.
Why you need Phi for mark word? For identity check (hashCode/identityHashCode intrinsics)?
Yes, I was wondering about a case where I'd need any of the information present in the Mark word. The code in the PR is able to solve merges where only some of the merged allocations are removed. I believe I'll need, at least, to preserve the Mark word of the allocations not removed.
Hi again @vnkozlov, I ended up giving up on adding support for merging allocations of different subclasses. I think the current version of the patch already covers some reasonable ground and it seems that adding support for merging allocations of subclasses will require more widespread changes.
Also, I wasn't able to find a case where the patch fail due to an incorrect/missing Mark word. FYI we have been testing the current patch with all JTREG tests, TCK tests, SPECJBB, Renaissance, DaCapo, and some customer services. I currently don't see any failure.
I also added a few JMH tests to show up what kind of improvements we can expect with the proposed changes:
Benchmark Mode Cnt Score Error Units
AllocationsMerge.IfElseInLoop avgt 5 93,839,709.289 ± 16846405.548 ns/op
AllocationsMerge.MergeAndIterative avgt 5 160,988.682 ± 57389.599 ns/op
AllocationsMerge.NestedObjectsObject avgt 5 311,523.642 ± 59622.703 ns/op
AllocationsMerge.SimpleMerge avgt 5 142,398.057 ± 57010.144 ns/op
AllocationsMerge.TrapAfterMerge avgt 5 333,061.738 ± 25645.896 ns/op
AllocationsMerge.WithAllocationsMergeEnabled.IfElseInLoop avgt 5 11,964,798.596 ± 1661843.496 ns/op
AllocationsMerge.WithAllocationsMergeEnabled.MergeAndIterative avgt 5 33,118.428 ± 2995.224 ns/op
AllocationsMerge.WithAllocationsMergeEnabled.NestedObjectsObject avgt 5 155,808.370 ± 30379.710 ns/op
AllocationsMerge.WithAllocationsMergeEnabled.SimpleMerge avgt 5 31,266.188 ± 13476.634 ns/op
AllocationsMerge.WithAllocationsMergeEnabled.TrapAfterMerge avgt 5 208,694.038 ± 18667.023 ns/op
Please let me know if there is something else that I can do to make it easier to review the proposed changes.
Thank you for taking a look @vnkozlov . I'll address your comments asap.
Hi @vnkozlov - I pushed some changes that I think address the comments that you made. I also had to add another constraint to reducing the allocations: for now, I won't be reducing merges of SR and NSR allocations. I'll definitely work on that in a subsequent patch.
Thank you again for reviewing and I'd appreciate it very much if you could check if the patch doesn't break any of your tests.
@JohnTortugo Good news is that I ran from tier1 to tier4 and with ReduceAllocationMerges
enabled and I did not get new failures except in your new test compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java
when it run with -ea -esa -XX:CompileThreshold=100 -XX:-TieredCompilation
flags.
I put failed test's output to RFE.
/reviewers 2
We need others to look on this too at this stage.
@vnkozlov The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).
And I will submit performance runs.
Thanks a lot for the comments @vnkozlov . I'll push new changes addressing your suggestions later today.
And I will submit performance runs.
Performance results are mixed bag. Saw some improvements on x64 in Renaissance but they gone after rerun. I see some significant regression in Dacapo on Linux Aarch64 (Ampere CPU) but don't see it at all on MacOS M1.
Hi @vnkozlov & @merykitty - thank you for the reviews. I'm working to address your comments. I had to take some time off of work due to personal issues but I believe I can address your comments during the weekend or earlier next week.
@vnkozlov - Thank you so much for running the perf. tests! Can you please give me more details about the experiment? What metric did you analyze, which version of Dacapo, and was it any specific benchmark that regressed or all of them? Which JVM parameters?
Regards, Cesar