lilliput
lilliput copied to clipboard
8347711: [Lilliput] Parallel GC support for compact identity hashcode
The Parallel GC does not yet support Lilliput 2 until now. The big problem has been that the Parallel Full GC is too rigid with respect to object sizes, and we could not make it work with compact identity hashcode, which requires that objects can grow during GC.
The PR implements an alternative full GC for Parallel GC, which is more flexible. The algorithm mostly follows G1 and Shenandoah, with the difference that it creates temporary 'regions' (because Parallel GC does not use heap regions), with boundaries such that no object crosses region boundaries, and then after GC fill any gaps at end of regions with dummy objects.
The implementation has a special 'serial' mode, which sets up only 4 regions which exactly match the 4 heap spaces (old, eden, from, to), and performs the forwarding and compaction phases serially to achieve perfect compaction at the expense of performance. (The marking and adjust-refs phases will still be done with parallel workers).
I've run the micro benchmarks for systemgc, there seem to be only minor differences, and looks mostly like a few milliseconds offset in the new implementation:
Baseline Full GC:
AllDead.gc ss 25 31.120 ± 0.447 ms/op
AllLive.gc ss 25 83.655 ± 2.238 ms/op
DifferentObjectSizesArray.gc ss 25 179.725 ± 1.171 ms/op
DifferentObjectSizesHashMap.gc ss 25 186.011 ± 1.409 ms/op
DifferentObjectSizesTreeMap.gc ss 25 65.668 ± 3.333 ms/op
HalfDeadFirstPart.gc ss 25 64.862 ± 0.696 ms/op
HalfDeadInterleaved.gc ss 25 67.764 ± 3.139 ms/op
HalfDeadInterleavedChunks.gc ss 25 59.160 ± 1.667 ms/op
HalfDeadSecondPart.gc ss 25 66.210 ± 1.167 ms/op
HalfHashedHalfDead.gc ss 25 69.584 ± 2.276 ms/op
NoObjects.gc ss 25 18.462 ± 0.270 ms/op
OneBigObject.gc ss 25 587.425 ± 27.493 ms/op
New Parallel Full GC:
AllDead.gc ss 25 39.891 ± 0.461 ms/op
AllLive.gc ss 25 87.898 ± 1.940 ms/op
DifferentObjectSizesArray.gc ss 25 184.109 ± 0.795 ms/op
DifferentObjectSizesHashMap.gc ss 25 189.620 ± 2.236 ms/op
DifferentObjectSizesTreeMap.gc ss 25 69.915 ± 3.308 ms/op
HalfDeadFirstPart.gc ss 25 70.664 ± 0.804 ms/op
HalfDeadInterleaved.gc ss 25 71.318 ± 1.583 ms/op
HalfDeadInterleavedChunks.gc ss 25 65.050 ± 1.827 ms/op
HalfDeadSecondPart.gc ss 25 70.964 ± 0.878 ms/op
HalfHashedHalfDead.gc ss 25 72.506 ± 1.617 ms/op
NoObjects.gc ss 25 23.809 ± 0.494 ms/op
OneBigObject.gc ss 25 403.461 ± 27.079 ms/op
Testing:
- [x] tier1 (+UseParallelGC +UCOH)
- [x] tier2 (+UseParallelGC +UCOH)
- [x] hotspot_gc (+UCOH)
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8347711: [Lilliput] Parallel GC support for compact identity hashcode (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/lilliput.git pull/195/head:pull/195
$ git checkout pull/195
Update a local copy of the PR:
$ git checkout pull/195
$ git pull https://git.openjdk.org/lilliput.git pull/195/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 195
View PR using the GUI difftool:
$ git pr show -t 195
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/lilliput/pull/195.diff
Using Webrev
:wave: Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@rkennke This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8347711: [Lilliput] Parallel GC support for compact identity hashcode
Reviewed-by: stuefe
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.
Webrevs
- 05: Full - Incremental (163798bb)
- 04: Full - Incremental (40512654)
- 03: Full - Incremental (95728b46)
- 02: Full - Incremental (738c3093)
- 01: Full - Incremental (ec376a91)
- 00: Full (08703bcf)
Can you articulate why object cannot grow?
Can you articulate why object cannot grow?
Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.
The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still.
The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste).
Can you articulate why object cannot grow?
Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.
You can count liveness with/without object expansion during mark phase, can you? given it reads object header and size anyway. I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right?
The problem is that, with compact identity hash-code, we don't know the size of the target object until the forwarding phase. That is because we need to know whether or not an object moves at all, and we don't know that until we do the actual forwarding. It might be a possibility to make the pessimistic assumption that all objects would be moved, and thus might need to grow (if they have an i-hash and no gaps to store it), and then deal with the waste that gets produced, but some of the calculations about the split-points and the locations where objects would cross region boundaries would be thrown off still.
The new approach is much more flexible, and the implementation is much simpler (but it tends to produce more waste).
Can you articulate why object cannot grow?
Yes. The way PSParallelCompact currently works: it collects liveness information during the marking phase. At the end of it, we know precisely how many words are live in each reagion. In this phase, we also determine where objects cross region boundaries. Then, during the summary phase, it uses this information to determine which source regions get compacted into which target regions. It also determines the addresses at which a region gets split - that is the part below the split-point gets compacted into one region, and the part above the split point gets compacted into another region. Then, during the forwarding phase, all those infos are used to calculate the target addresses for each object in parallel. It is now possible to divide the forwarding work across several GC worker threads because we know exactly where each region gets compacted into.
You can count liveness with/without object expansion during mark phase, can you?
Yes, I can. I actually implemented that in an earlier attemp.
I assume that only regions "overflow" due to expansion, can cause problem, but chance is very slim, special case them won't be so bad, right?
The problem is not potential overflow. In-fact, overflow can not really happen. The problem is that all those calculations are assumed and required to be precise. Turning them into guesses breaks the algorithm.
How does this algorithm deal with objects larger than region size?
How does this algorithm deal with objects larger than region size?
Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.
How does this algorithm deal with objects larger than region size?
Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.
Are you talking about parallel specific? Can you explain why region size > object size > 1/2 region size scenario? Thanks.
How does this algorithm deal with objects larger than region size?
Currently not particularily well: it doesn't move them at all (because they don't fit), and it also doesn't move other objects past them. There's a TODO to make it possible to move objects around large objects (don't have to be > region-sized, also something like 1/2 region sized objects may have difficulty to move). I'll address that in a follow-up PR.
Are you talking about parallel specific? Can you explain why region size > object size > 1/2 region size scenario? Thanks.
I only used 1/2 region size as an example. It can affect any object: if there is not enough space in the target region, and there is no other preceding target region in the queue, we might not be able to move the object, or only move the object within its region. It's only that the problem becomes worse the larger the object is, and for objects > region-size it's currently guaranteed that we can not move it. Being able to move objects around large/un-movable objects would help a lot to reduce gaps.
Thanks! /integrate
Going to push as commit 4b6094a023e6d9f3c74a0afc65192d7e81c7e3a0.
Since your change was applied there has been 1 commit pushed to the master branch:
- e20e6c8450d355478a564f9b6147ab5bc0ffdf7d: 8353849: [Lilliput] Avoid race in compact identity hashcode
Your commit was automatically rebased without conflicts.
@rkennke Pushed as commit 4b6094a023e6d9f3c74a0afc65192d7e81c7e3a0.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.