jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8305895: Implement JEP 450: Compact Object Headers (Experimental)

Open rkennke opened this issue 1 year ago β€’ 65 comments

This is the main body of the JEP 450: Compact Object Headers (Experimental).

It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing.

Main changes:

  • Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers.
  • The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded.
  • Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops).
  • Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all).
  • Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16).
  • Arrays will now store their length at offset 8.
  • CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant.
  • Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions.

Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.) The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that.

  • [ ] tier1 (x86_64)
  • [ ] tier2 (x86_64)
  • [ ] tier3 (x86_64)
  • [ ] tier4 (x86_64)
  • [ ] tier1 (aarch64)
  • [ ] tier2 (aarch64)
  • [ ] tier3 (aarch64)
  • [ ] tier4 (aarch64)
  • [x] tier1 (x86_64) +UseCompactObjectHeaders
  • [x] tier2 (x86_64) +UseCompactObjectHeaders
  • [x] tier3 (x86_64) +UseCompactObjectHeaders
  • [x] tier4 (x86_64) +UseCompactObjectHeaders
  • [x] tier1 (aarch64) +UseCompactObjectHeaders
  • [x] tier2 (aarch64) +UseCompactObjectHeaders
  • [x] tier3 (aarch64) +UseCompactObjectHeaders
  • [x] tier4 (aarch64) +UseCompactObjectHeaders
  • [x] Running as a backport in production since >1 year.

Progress

  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change requires a JEP request to be targeted
  • [ ] Change must be properly reviewed (3 reviews required, with at least 1 Reviewer, 2 Authors)

Issues

  • JDK-8305895: Implement JEP 450: Compact Object Headers (Experimental) (Enhancement - P4)
  • JDK-8294992: JEP 450: Compact Object Headers (Experimental) (JEP)
  • JDK-8306000: Add experimental -XX:+UseCompactObjectHeaders flag (CSR) (Withdrawn)

Reviewers

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677
$ git checkout pull/20677

Update a local copy of the PR:
$ git checkout pull/20677
$ git pull https://git.openjdk.org/jdk.git pull/20677/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20677

View PR using the GUI difftool:
$ git pr show -t 20677

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20677.diff

Webrev

Link to Webrev Comment

rkennke avatar Aug 22 '24 13:08 rkennke

:wave: Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Aug 22 '24 13:08 bridgekeeper[bot]

@rkennke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8305895: Implement JEP 450: Compact Object Headers (Experimental)

Co-authored-by: Sandhya Viswanathan <[email protected]>
Co-authored-by: Martin Doerr <[email protected]>
Co-authored-by: Hamlin Li <[email protected]>
Co-authored-by: Thomas Stuefe <[email protected]>
Co-authored-by: Amit Kumar <[email protected]>
Co-authored-by: Stefan Karlsson <[email protected]>
Co-authored-by: Coleen Phillimore <[email protected]>
Co-authored-by: Axel Boldt-Christmas <[email protected]>
Reviewed-by: coleenp, stefank, stuefe, phh, ihse, lmesnik, tschatzl, matsaave, rcastanedalo, vpaprotski, yzheng, egahlin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 20 new commits pushed to the master branch:

  • 72f67fb16a2470fc40c6ddc3700ba92f98a62096: 8343411: Test fail on Windows due to "An established connection was aborted by the software in your host machine"
  • 7d6a2f3740bf42652bdf05bb922d1f2b2ae60d6a: 8342860: Fix more NULL usage backsliding
  • f6edfe58d6931b058a5fec722615740818711065: 8343506: [s390x] multiple test failures with ubsan
  • 96eed7fa6c025374bc10039bca2949a76d78f890: 8343306: javac is failing to determine if a class and a sealed interface are disjoint
  • 0c281acfb4c87436096cb562d70f800dffa3671a: 8343754: Problemlist jdk/jfr/event/oldobject/TestShenandoah.java after JDK-8279016
  • 2e58ede18c7cfe7364a8d6a630989b0ff2ea6447: 8341399: Add since checker tests to the langtools modules
  • c7f071cf36a6f064e293e82e7e5bb0abcc76ad70: 8343189: [REDO] JDK-8295269 G1: Improve slow startup due to predictor initialization
  • a9e53bb11788761277563d5912a9981932a7c1c9: 8343783: Improve asserts in concurrentHashTable.inline.hpp
  • bf5c3ce2808c65329ff0ea5ef37b980c3b3029df: 8272780: ServerNotifForwarder.removeNotificationListener() incorrect exception handling
  • a10b1ccd377335354db7505e9944496729e539ce: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue
  • ... and 10 more: https://git.openjdk.org/jdk/compare/d3c042f9a0cd91e7cdf7f45cb0ea387f7ba9785b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➑️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

@rkennke The following labels will be automatically applied to this pull request:

  • build
  • core-libs
  • graal
  • hotspot
  • serviceability
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

/label remove core-libs /label add hotspot-gc /label add hotspot-runtime

rkennke avatar Aug 22 '24 13:08 rkennke

/jep JEP-450

rkennke avatar Aug 22 '24 13:08 rkennke

@rkennke The core-libs label was successfully removed.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

@rkennke The hotspot-gc label was successfully added.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

@rkennke The hotspot-runtime label was successfully added.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

@rkennke This pull request will not be integrated until the JEP-450 has been targeted.

openjdk[bot] avatar Aug 22 '24 13:08 openjdk[bot]

Webrevs

mlbridge[bot] avatar Aug 22 '24 14:08 mlbridge[bot]

@magicus The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 3 (with at least 1 Reviewer, 2 Authors).

openjdk[bot] avatar Aug 22 '24 19:08 openjdk[bot]

I've looked through the changes to the gc/ directory and have a couple of proposal changes. Please have a look: https://github.com/openjdk/jdk/compare/pr/20677...stefank:jdk:lilliput_review_gc_1

If these suggestions are accepted, it's only Shenandoah that still uses the forward_safe_* functions:

  // The following set of methods is used to access the mark-word and related
  // properties when the object may be forwarded. Be careful where and when
  // using this method. It assumes that the forwardee is installed in
  // the header as a plain pointer (or self-forwarded). In particular,
  // those methods can not deal with the encoded forwarding that is used
  // in Serial, Parallel, G1 and Shenandoah full-GCs.
private:
  inline Klass*   forward_safe_klass_impl(markWord m) const;
public:
  inline Klass*   forward_safe_klass() const;
  inline Klass*   forward_safe_klass(markWord m) const;
  inline size_t   forward_safe_size();
  inline void     forward_safe_init_mark();

It would be nice if these could all move over to Shenandoah, or somehow got evaporated into Shenandoah, just like I did for the other GCs.

stefank avatar Aug 23 '24 10:08 stefank

@rkennke this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8305895-v4
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk[bot] avatar Sep 10 '24 12:09 openjdk[bot]

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

You added these IR rule restriction: @IR(applyIf = {"UseCompactObjectHeaders", "false"},

This means that if UseCompactObjectHeaders is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.

I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?

Of course some special tests may just rely on UseCompactObjectHeaders == false - but I would need some comments in the tests where you added it to justify why we add the restriction.

Please also test this patch with the cross combinations of UseCompactObjectHeaders and AlignVector enabled and disabled (and add VerifyAlignVector as well).

eme64 avatar Sep 11 '24 08:09 eme64

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

You added these IR rule restriction: @IR(applyIf = {"UseCompactObjectHeaders", "false"},

This means that if UseCompactObjectHeaders is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact.

I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway?

Of course some special tests may just rely on UseCompactObjectHeaders == false - but I would need some comments in the tests where you added it to justify why we add the restriction.

Please also test this patch with the cross combinations of UseCompactObjectHeaders and AlignVector enabled and disabled (and add VerifyAlignVector as well).

IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.

I will re-evaluate those tests, and add comments or remove the restrictions.

rkennke avatar Sep 11 '24 13:09 rkennke

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

You added these IR rule restriction: @IR(applyIf = {"UseCompactObjectHeaders", "false"}, This means that if UseCompactObjectHeaders is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on UseCompactObjectHeaders == false - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of UseCompactObjectHeaders and AlignVector enabled and disabled (and add VerifyAlignVector as well).

IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible.

I will re-evaluate those tests, and add comments or remove the restrictions.

If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;)

My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.

eme64 avatar Sep 11 '24 14:09 eme64

I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor.

 diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp
 index fd198f54fc9..7aa4bd24948 100644
 --- a/src/hotspot/share/oops/instanceKlass.cpp
 +++ b/src/hotspot/share/oops/instanceKlass.cpp
@@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() {
 }
 
 InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) :
-  Klass(kind),
+  Klass(kind, (!parser.is_interface() && !parser.is_abstract())),
   _nest_members(nullptr),
   _nest_host(nullptr),
   _permitted_subclasses(nullptr),

coleenp avatar Sep 11 '24 21:09 coleenp

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

You added these IR rule restriction: @IR(applyIf = {"UseCompactObjectHeaders", "false"}, This means that if UseCompactObjectHeaders is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on UseCompactObjectHeaders == false - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of UseCompactObjectHeaders and AlignVector enabled and disabled (and add VerifyAlignVector as well).

IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. I will re-evaluate those tests, and add comments or remove the restrictions.

If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;)

My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.

Indeed, I could re-enable all tests in:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java

but unfortunately not those others:

> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.

I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

rkennke avatar Sep 12 '24 13:09 rkennke

@rkennke Can you please explain the changes in these tests:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

You added these IR rule restriction: @IR(applyIf = {"UseCompactObjectHeaders", "false"}, This means that if UseCompactObjectHeaders is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on UseCompactObjectHeaders == false - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of UseCompactObjectHeaders and AlignVector enabled and disabled (and add VerifyAlignVector as well).

IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. I will re-evaluate those tests, and add comments or remove the restrictions.

If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization.

Indeed, I could re-enable all tests in:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java

but unfortunately not those others:

> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.

I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

Excellent, that is what I hoped for! Thanks for filing the bug, I'll look into it once this is integrated. You should probably mark it as "blocked by", not "related to" ;)

eme64 avatar Sep 12 '24 13:09 eme64

@coleenp

I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor.

I solved this differently (Roman will merge this into his PR).

static markWord make_prototype(const Klass* kls) {
  markWord prototype = markWord::prototype();
#ifdef _LP64
  if (UseCompactObjectHeaders) {
    // With compact object headers, the narrow Klass ID is part of the mark word.
    // We therfore seed the mark word with the narrow Klass ID.
    // Note that only those Klass that can be instantiated have a narrow Klass ID.
    // For those who don't, we leave the klass bits empty and assert if someone
    // tries to use those.
    const narrowKlass nk = CompressedKlassPointers::is_encodable(kls) ?
        CompressedKlassPointers::encode(const_cast<Klass*>(kls)) : 0;
    prototype = prototype.set_narrow_klass(nk);
  }
#endif
  return prototype;
}

inline bool CompressedKlassPointers::is_encodable(const void* address) {
  check_init(_base);
  // An address can only be encoded if:
  //
  // 1) the address lies within the klass range.
  // 2) It is suitably aligned to 2^encoding_shift. This only really matters for
  //    +UseCompactObjectHeaders, since the encoding shift can be large (max 10 bits -> 1KB).
  return is_aligned(address, klass_alignment_in_bytes()) &&
      address >= _klass_range_start && address < _klass_range_end;
}

So, we put an nKlass into the prototype if we can. We can, if the Klass address is encodable. It is encodable if it lives in the encoded Klass range and is correctly aligned. No need to pass this information via another channel: its right there, in the Klass address. This works even before Klass is initialized.

tstuefe avatar Sep 15 '24 06:09 tstuefe

  • Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that.

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

An alternative that seems promising is to hide the object header klass pointer extraction and make it part of the LoadNKlass node semantics, as illustrated in this example:

alternative-modeling

LoadNKlass nodes can then be expanded into more primitive operations (load and shift for compact headers, load with klass_offset_in_bytes() for original headers) within C2's back-end or even during code emission as sketched here. @rkennke is this similar to what you tried out ("Expanding it as a macro")?

robcasloz avatar Sep 16 '24 08:09 robcasloz

LoadNKlass nodes can then be expanded into more primitive operations (load and shift for compact headers, load with klass_offset_in_bytes() for original headers) within C2's back-end or even during code emission as sketched here. @rkennke is this similar to what you tried out ("Expanding it as a macro")?

No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something.

rkennke avatar Sep 16 '24 13:09 rkennke

Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support?

mur47x111 avatar Sep 18 '24 12:09 mur47x111

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on oopDesc::klass_offset_in_bytes()) being a pre-condition for a future, non-experimental version of compact headers?

robcasloz avatar Sep 19 '24 11:09 robcasloz

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on oopDesc::klass_offset_in_bytes()) being a pre-condition for a future, non-experimental version of compact headers?

Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work?

rkennke avatar Sep 19 '24 11:09 rkennke

⚠️ @rkennke This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

openjdk[bot] avatar Sep 19 '24 12:09 openjdk[bot]

I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version.

What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on oopDesc::klass_offset_in_bytes()) being a pre-condition for a future, non-experimental version of compact headers?

Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work?

Done: https://bugs.openjdk.org/browse/JDK-8340453.

robcasloz avatar Sep 19 '24 13:09 robcasloz

In both aarch64.ad and x86_64.ad, MachUEPNode::format might need some change accordingly?

Hamlin-Li avatar Sep 19 '24 15:09 Hamlin-Li

Indeed, I could re-enable all tests in:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java

but unfortunately not those others:

> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java

I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset.

I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it.

@rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N with N <= 3 on an Intel Xeon Platinum 8358 machine:

  • test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
  • test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
  • test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java

Here are the failure details:

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java:

1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\\[2\\]:\\{long\\})"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!
         * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!


test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java:

1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 > 0 [given]
           - No nodes matched!

robcasloz avatar Sep 30 '24 12:09 robcasloz

test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java:

I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3?

rkennke avatar Oct 01 '24 15:10 rkennke