lilliput 8347711: [Lilliput] Parallel GC support for compact identity hashcode

trafficstars

The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC.

The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects.

The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers).

I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation:

Baseline Full GC:

AllDead.gc                        ss   25   31.120 ±  0.447  ms/op
AllLive.gc                        ss   25   83.655 ±  2.238  ms/op
DifferentObjectSizesArray.gc      ss   25  179.725 ±  1.171  ms/op
DifferentObjectSizesHashMap.gc    ss   25  186.011 ±  1.409  ms/op
DifferentObjectSizesTreeMap.gc    ss   25   65.668 ±  3.333  ms/op
HalfDeadFirstPart.gc              ss   25   64.862 ±  0.696  ms/op
HalfDeadInterleaved.gc            ss   25   67.764 ±  3.139  ms/op
HalfDeadInterleavedChunks.gc      ss   25   59.160 ±  1.667  ms/op
HalfDeadSecondPart.gc             ss   25   66.210 ±  1.167  ms/op
HalfHashedHalfDead.gc             ss   25   69.584 ±  2.276  ms/op
NoObjects.gc                      ss   25   18.462 ±  0.270  ms/op
OneBigObject.gc                   ss   25  587.425 ± 27.493  ms/op

New Parallel Full GC:

AllDead.gc                        ss   25   39.891 ±  0.461  ms/op
AllLive.gc                        ss   25   87.898 ±  1.940  ms/op
DifferentObjectSizesArray.gc      ss   25  184.109 ±  0.795  ms/op
DifferentObjectSizesHashMap.gc    ss   25  189.620 ±  2.236  ms/op
DifferentObjectSizesTreeMap.gc    ss   25   69.915 ±  3.308  ms/op
HalfDeadFirstPart.gc              ss   25   70.664 ±  0.804  ms/op
HalfDeadInterleaved.gc            ss   25   71.318 ±  1.583  ms/op
HalfDeadInterleavedChunks.gc      ss   25   65.050 ±  1.827  ms/op
HalfDeadSecondPart.gc             ss   25   70.964 ±  0.878  ms/op
HalfHashedHalfDead.gc             ss   25   72.506 ±  1.617  ms/op
NoObjects.gc                      ss   25   23.809 ±  0.494  ms/op
OneBigObject.gc                   ss   25  403.461 ± 27.079  ms/op

Testing:

[x] tier1 (+UseParallelGC +UCOH)
[x] tier2 (+UseParallelGC +UCOH)
[x] hotspot_gc (+UCOH)

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8347711: [Lilliput] Parallel GC support for compact identity hashcode (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/lilliput.git pull/195/head:pull/195
$ git checkout pull/195

Update a local copy of the PR:
$ git checkout pull/195
$ git pull https://git.openjdk.org/lilliput.git pull/195/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 195

View PR using the GUI difftool:
$ git pr show -t 195

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/lilliput/pull/195.diff

Using Webrev

Link to Webrev Comment

Mar 14 '25 11:03 rkennke

:wave: Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Mar 14 '25 11:03 bridgekeeper[bot]

@rkennke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8347711: [Lilliput] Parallel GC support for compact identity hashcode

Reviewed-by: stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Mar 14 '25 11:03 openjdk[bot]

Webrevs

Mar 14 '25 17:03 mlbridge[bot]

Can you articulate why object cannot grow?

Mar 16 '25 17:03 zhengyu123

Can you articulate why object cannot grow?

Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.

The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still.

The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste).

Mar 17 '25 09:03 rkennke

Can you articulate why object cannot grow?

Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.

You can count liveness with/without object expansion during mark phase, can you? given it reads object header and size anyway. I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right?

The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still.

The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste).

Mar 17 '25 13:03 zhengyu123

Can you articulate why object cannot grow?

Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.

You can count liveness with/without object expansion during mark phase, can you?

Yes, I can. I actually implemented that in an earlier attemp.

I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right?

The problem is not potential overflow. In-fact, overflow can not really happen. The problem is that all those calculations are assumed and required to be precise. Turning them into guesses breaks the algorithm.

Mar 17 '25 14:03 rkennke

How does this algorithm deal with objects larger than region size?

Apr 07 '25 13:04 tstuefe

How does this algorithm deal with objects larger than region size?

Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.

Apr 07 '25 13:04 rkennke

How does this algorithm deal with objects larger than region size?

Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.

Are you talking about parallel specific? Can you explain why region size > object size > 1/2 region size scenario? Thanks.

Apr 07 '25 13:04 zhengyu123

How does this algorithm deal with objects larger than region size?

Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.

Are you talking about parallel specific? Can you explain why region size > object size > 1/2 region size scenario? Thanks.

I only used 1/2 region size as an example. It can affect any object: if there is not enough space in the target region, and there is no other preceding target region in the queue, we might not be able to move the object, or only move the object within its region. It's only that the problem becomes worse the larger the object is, and for objects > region-size it's currently guaranteed that we can not move it. Being able to move objects around large/un-movable objects would help a lot to reduce gaps.

Apr 07 '25 15:04 rkennke

Thanks! /integrate

Apr 10 '25 11:04 rkennke

Going to push as commit 4b6094a023e6d9f3c74a0afc65192d7e81c7e3a0. Since your change was applied there has been 1 commit pushed to the master branch:

e20e6c8450d355478a564f9b6147ab5bc0ffdf7d: 8353849: [Lilliput] Avoid race in compact identity hashcode

Your commit was automatically rebased without conflicts.

Apr 10 '25 11:04 openjdk[bot]

@rkennke Pushed as commit 4b6094a023e6d9f3c74a0afc65192d7e81c7e3a0.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Apr 10 '25 11:04 openjdk[bot]

lilliput lilliput copied to clipboard

8347711: [Lilliput] Parallel GC support for compact identity hashcode

Progress

Issue

Reviewing

Webrevs

lilliput
lilliput copied to clipboard