jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8331865: Consolidate size and alignment checks in LayoutPath

Open mcimadamore opened this issue 1 year ago • 9 comments

When creating a nested memory access var handle, we ensure that the segment is accessed at the correct alignment for the root layout being accessed. But we do not ensure that the segment has at least the size of the accessed root layout. Example:

MemoryLayout LAYOUT = sequenceLayout(2, structLayout(JAVA_INT.withName("x"), JAVA_INT.withName("y")));
VarHandle X_HANDLE = LAYOUT.varHandle(PathElement.sequenceElement(), PathElement.groupElement("x"));

If I have a memory segment whose size is 12, I can successfully call X_HANDLE on it with index 1, like so:

MemorySegment segment = Arena.ofAuto().allocate(12);
int x = (int)X_HANDLE.get(segment, 0, 1);

This seems incorrect: after all, the provided segment doesn't have enough space to fit two elements of the nested structs.

This PR adds checks to detect this kind of scenario earlier, thus improving usability. To achieve this we could, in principle, just tweak LayoutPath::checkEnclosingLayout and add the required size check.

But doing so will add yet another redundant check on top of the other already added by pull/19124. Instead, this PR rethinks how memory segment var handles are created, in order to eliminate redundant checks.

The main observation is that size and alignment checks depend on an enclosing layout. This layout might be the very same value layout being accessed (this is the case when e.g. using JAVA_INT.varHandle()). But, in the general case, the accessed value layout and the enclosing layout might differ (e.g. think about accessing a struct field).

Furthermore, the enclosing layout check depends on the base offset at which the segment is accessed, prior to any index computation that occurs if the accessed layout path has any open elements. It is important to notice that this base offset is only available when looking at the var handle that is returned to the user. For instance, an indexed var handle with coordinates:

(MemorySegment, long /* base */, long /* index 1 */, long /* index 2 */, long /* index 3 */)

Is obtained through adaptation, by taking a more basic var handle of the form:

(MemorySegment, long /* offset */)

And then injecting the result of the index multiplication into offset. As such, we can't add an enclosing layout check inside the var handle returned by VarHandles::memorySegmentViewHandle, as doing so will end up seeing the wrong offset (e.g. an offset in which the index part has already been mixed in).

The only possibility then, is to remove size and alignment checks from the raw var handles returned by VarHandles::memorySegmentViewHandle, and perform such checks outside (e.g. in LayoutPath::dereferenceHandle). The only checks left in the raw var handles are:

  • an alignment check to make sure that the access mode selected by the client is really available - this alignment check is against the alignment of the value layout being selected, and not the enclosing layout alignment (which might be stricter)
  • a read-only check, to make sure that write access modes are blocked if the accessed segment is read-only
  • liveness/confinement checks, as mandated by ScopedMemoryAccess

Since these check depends on the particular access mode selected by the client, we can't move these checks away from the raw var handle.

These changes come with some consequences:

  • Now it is always necessary to adapt a raw var handle, and to insert appropriate size and alignment checks (otherwise OOB access might be possible). As such, ValueLayouts also need to call the path layout variant of MemoryLayout::varHandle, to make sure that the raw var handle is adapted accordingly, before it is cached in its stable field.
  • The var handle cache has been moved from Utils to ValueLayouts::varHandle. The cache is used (a) to reduce the number of var handle instances created and (b) to make sure that the cached var handle in the ValueLayout has stable identity. With respect to (a), while it makes sense to cache "toplevel" var handles (e.g. JAVA_INT.varHandle()) the cost/benefit ratio for caching nested var handles seem more unfavourable, as the same var handle can be reused with different enclosing layouts, leading to different checks. Ultimately, all nested var handles will require some kind of adaptation, so it doesn't seem too crucial to have a deeper level of caching here.
  • The order in which exceptions are thrown might be slightly different. This is because the size/alignment checks now take place before the raw var handle is even called. This caused a bunch of small updates in code and tests.
  • It used to be possible to create a sequence layout with maximal size, like MemoryLayout.sequenceLayout(Long.MAX_VALUE, JAVA_LONG), and derive an indexed var handle from that layout. Since there used to be no enclosing layout size check, access to the sequence element was allowed, as long as the index computation did not result in an offset outside the boundary of the accessed memory segment. This is now no longer the case: when selecting an element from the above layout, the implementation will make sure that the accessed segment indeed has the size of that sequence layout (which will probably lead to a IndexOutOfBoundException). To do indexed access on an unbounded sequence, the MemoryLayout::arrayElementVarHandle should be used instead (but this is the norm anyway for such cases).

Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue
  • [ ] Change requires a CSR request matching fixVersion 23 to be approved (needs to be created)

Issue

  • JDK-8331865: Consolidate size and alignment checks in LayoutPath (Enhancement - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19251/head:pull/19251
$ git checkout pull/19251

Update a local copy of the PR:
$ git checkout pull/19251
$ git pull https://git.openjdk.org/jdk.git pull/19251/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19251

View PR using the GUI difftool:
$ git pr show -t 19251

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19251.diff

Webrev

Link to Webrev Comment

mcimadamore avatar May 15 '24 15:05 mcimadamore

:wave: Welcome back mcimadamore! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar May 15 '24 15:05 bridgekeeper[bot]

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331865: Consolidate size and alignment checks in LayoutPath

Reviewed-by: psandoz, jvernee

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 162 new commits pushed to the master branch:

  • da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e: 8332849: Update doc/testing.{md,html} (spelling and stale information)
  • b8f2ec9091f9f7e5f4611991d04dd8aa113b94fd: 8195675: Call to insertText with single character from custom Input Method ignored
  • 0f3e2cc334e5926d53bbbce22e4a6bfeb2752140: 8331670: Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal
  • 51ae08f72b879bc611177ea643cd88e36185d9e8: 8333093: Incorrect comment in zAddress_aarch64.cpp
  • 4754f059f99a426cc8c5d94b0809e79d563ffc2e: 8333035: Parallel: Remove ParMarkBitMap::IterationStatus
  • 87a06b6ce41f8623d9111b4e41c72f0ddf842acd: 8325805: Compiler Implementation for Flexible Constructor Bodies (Second Preview)
  • e708d135e3af7e0652cdbb680388a0735582ba74: 8332064: Implementation of Structured Concurrency (Third Preview)
  • 7b52d0acfc7d6083b407efa0877c139e9837f86b: 8332265: RISC-V: Materialize pointers faster by using a temp register
  • aa4c83a5bfe146714a46fb454aafc7393d2d8453: 8332505: JEP 457: ClassRemapper forgets to remap bootstrap method references
  • cabe337400a0bd61d73bf3ca66e16266267299c7: 8331921: Hotspot assembler files should use common logic to setup exported functions
  • ... and 152 more: https://git.openjdk.org/jdk/compare/61aff6db15d5bdda77427af5ce34d0fe43373197...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar May 15 '24 15:05 openjdk[bot]

@mcimadamore The core-libs label was successfully added.

openjdk[bot] avatar May 15 '24 15:05 openjdk[bot]

/csr

mcimadamore avatar May 16 '24 10:05 mcimadamore

@mcimadamore has indicated that a compatibility and specification (CSR) request is needed for this pull request.

@mcimadamore please create a CSR request for issue JDK-8331865 with the correct fix version. This pull request cannot be integrated until the CSR request is approved.

openjdk[bot] avatar May 16 '24 10:05 openjdk[bot]

Webrevs

mlbridge[bot] avatar May 16 '24 11:05 mlbridge[bot]

Do we have any performance tests available to see if there are any potential impacts?

minborg avatar May 16 '24 11:05 minborg

Do we have any performance tests available to see if there are any potential impacts?

I've run all micro benchmarks whose name match LoopOver*. No regression was found. Few benchmarks seems a tad faster, but the percentages are so tiny that I'm sure it has nothing to do with the patch. I compared results before/after using JMH visualizer.

mcimadamore avatar May 16 '24 13:05 mcimadamore

Do we have any performance tests available to see if there are any potential impacts?

I've run all micro benchmarks whose name match LoopOver*. No regression was found. Few benchmarks seems a tad faster, but the percentages are so tiny that I'm sure it has nothing to do with the patch. I compared results before/after using JMH visualizer.

Results here: https://cr.openjdk.org/~mcimadamore/jdk/8331865/ Unfortunately I can't seem to be able to upload the results in the visualizer (which would be handy, so I could share a link to the results). Not sure if it's an issue in the visualizer, or in the cr server itself. But, it is possible to at least download the above results locally, and then upload them to the visualizer tool.

mcimadamore avatar May 16 '24 14:05 mcimadamore

Separately, we might be missing a few long argument accepting guard methods for simpler cases as I suspect they are still focused more on int index types.

Not sure I understand what guard methods you are referring to here?

mcimadamore avatar May 20 '24 09:05 mcimadamore

Separately, we might be missing a few long argument accepting guard methods for simpler cases as I suspect they are still focused more on int index types.

Not sure I understand what guard methods you are referring to here?

Ah, got it. You mean more support in VarHandleGuards. Yes, I have a separate patch for that (actually had that for quite a while), but we're not super sure how to evaluate what impact it has :-)

mcimadamore avatar May 20 '24 16:05 mcimadamore

Ah, got it. You mean more support in VarHandleGuards. Yes, I have a separate patch for that (actually had that for quite a while), but we're not super sure how to evaluate what impact it has :-)

Ah, i did not realize that. Yes its tricky, it was designed for VHs to fields/arrays, to really minimize their overhead. With segments there is already some additional overhead e.g.,

        if (derefAdapters.length == 0) {
            // insert align check for the root layout on the initial MS + offset
            List<Class<?>> coordinateTypes = handle.coordinateTypes();
            MethodHandle alignCheck = MethodHandles.insertArguments(MH_CHECK_ENCL_LAYOUT, 2, rootLayout());
            handle = MethodHandles.collectCoordinates(handle, 0, alignCheck);
            int[] reorder = IntStream.concat(IntStream.of(0, 1), IntStream.range(0, coordinateTypes.size())).toArray();
            handle = MethodHandles.permuteCoordinates(handle, coordinateTypes, reorder);
        }

So perhaps it does not make much difference.

PaulSandoz avatar May 20 '24 16:05 PaulSandoz

some additional overhead e.g.,

        if (derefAdapters.length == 0) {
            // insert align check for the root layout on the initial MS + offset
            List<Class<?>> coordinateTypes = handle.coordinateTypes();
            MethodHandle alignCheck = MethodHandles.insertArguments(MH_CHECK_ENCL_LAYOUT, 2, rootLayout());
            handle = MethodHandles.collectCoordinates(handle, 0, alignCheck);
            int[] reorder = IntStream.concat(IntStream.of(0, 1), IntStream.range(0, coordinateTypes.size())).toArray();
            handle = MethodHandles.permuteCoordinates(handle, coordinateTypes, reorder);
        }

True, the chain for segment var handle is quite long (and that is not a result of this patch, it has always been more or less like that). Funnily, I was staring at this piece of code the other day, and I think this can be dealt with morally with a foldArguments, but we don't have the equivalent of that in the VH/coordinates world. Maybe we should add that (or at least implement internally) as that would simplify the adaptation, avoiding the permute step (which is typically rather heavy).

mcimadamore avatar May 20 '24 18:05 mcimadamore

/integrate

mcimadamore avatar May 29 '24 11:05 mcimadamore

Going to push as commit c003c1207fae07bcfe5a6f642a9c05e6c591e7a6. Since your change was applied there have been 173 commits pushed to the master branch:

  • 6d718ae51aeb7143ebfa561501b87fe1ba48039a: 8324341: Remove redundant preprocessor #if's checks
  • 9b64ece514cf941ebc727991d97c43453d8a488d: 8332904: ubsan ppc64le: c1_LIRGenerator_ppc.cpp:581:21: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long int'
  • 3d4eb159e6d597f37081faf21b7e3f0f1af299e5: 8302744: Refactor Hotspot container detection code
  • 2cca83bc82eb6b090ae96b8c072b986b93d9244a: 8332880: JFR GCHelper class recognizes "Archive" regions as valid
  • b8ae11e99b99866888ad090c98c96e6d0c33a3c9: 8332960: ubsan: classListParser.hpp:159:12: runtime error: load of value 2101478704, which is not a valid value for type 'ParseMode'
  • 9a83dfee14f4cd9cda476d11a027294a810953cb: 8332431: NullPointerException in JTable of SwingSet2
  • 01060ad4ab18581aa46bc16e64c7f12a591a682b: 8325083: jdk/incubator/vector/Double512VectorTests.java crashes in Assembler::vex_prefix_and_encode
  • 673f767dadc8f3a784b9c31c406422846df3279b: 8285506: Unify os::vsnprintf implementations
  • 91ab088d5e64e068bafcda8d08f1769c39ba10d6: 8333116: test/jdk/tools/jpackage/share/ServiceTest.java test fails
  • 9ac8d05a2567fbf65b944660739e5f8ad1fc2020: 8332228: TypePollution.java: Unrecognized VM option 'UseSecondarySuperCache'
  • ... and 163 more: https://git.openjdk.org/jdk/compare/61aff6db15d5bdda77427af5ce34d0fe43373197...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar May 29 '24 11:05 openjdk[bot]

@mcimadamore Pushed as commit c003c1207fae07bcfe5a6f642a9c05e6c591e7a6.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar May 29 '24 11:05 openjdk[bot]

Do we have any performance tests available to see if there are any potential impacts?

I've run all micro benchmarks whose name match LoopOver*. No regression was found. Few benchmarks seems a tad faster, but the percentages are so tiny that I'm sure it has nothing to do with the patch. I compared results before/after using JMH visualizer.

Results here: https://cr.openjdk.org/~mcimadamore/jdk/8331865/ Unfortunately I can't seem to be able to upload the results in the visualizer (which would be handy, so I could share a link to the results). Not sure if it's an issue in the visualizer, or in the cr server itself. But, it is possible to at least download the above results locally, and then upload them to the visualizer tool.

Here's a link to the results: https://jmh.morethan.io/?sources=https://corsproxy.io/?https://cr.openjdk.org/~mcimadamore/jdk/8331865/loop_over_00_baseline.json,https://corsproxy.io/?https://cr.openjdk.org/~mcimadamore/jdk/8331865/loop_over_01_8331865.json

mcimadamore avatar May 29 '24 15:05 mcimadamore