gctoolkit
gctoolkit copied to clipboard
Generational ZGC incorrect handling of multiple concurrent collections
Hi. #414 added generational ZGC support, however, there are some corner cases that are currently handled incorrectly.
- It assumes that a major cycle contains exactly 1 young collection, and exactly 1 old collection. This is not the case. A major can have exactly 1 old collection, and at least one young collection (I have seen examples with 2 young collections in a single major cycle, I don't know if it's possible to have 3 or more)
- The parser has a single forward reference. If a major cycle starts, and then a minor cycle starts before that major cycle has finished, then the parser will discard the information from the major cycle
Example:
❯ egrep -e '\[gc,phases.+(Young|Old) Generation' gc.log
[2024-11-19T09:48:31.967-0600][info][gc,phases ] GC(0) Y: Young Generation
[2024-11-19T09:48:31.984-0600][info][gc,phases ] GC(0) Y: Young Generation 88M(1%)->54M(0%) 0.017s
[2024-11-19T09:48:31.984-0600][info][gc,phases ] GC(0) O: Old Generation
[2024-11-19T09:48:31.997-0600][info][gc,phases ] GC(0) O: Old Generation 54M(0%)->56M(0%) 0.013s
[2024-11-19T09:48:38.557-0600][info][gc,phases ] GC(1) Y: Young Generation
[2024-11-19T09:48:39.641-0600][info][gc,phases ] GC(1) Y: Young Generation 828M(5%)->1392M(8%) 1.084s
[2024-11-19T09:48:39.641-0600][info][gc,phases ] GC(1) O: Old Generation
[2024-11-19T09:48:39.698-0600][info][gc,phases ] GC(1) O: Old Generation 1392M(8%)->1506M(9%) 0.056s
[2024-11-19T09:48:39.753-0600][info][gc,phases ] GC(2) Y: Young Generation
[2024-11-19T09:48:42.252-0600][info ][gc,phases ] GC(2) Y: Young Generation 1642M(10%)->7986M(49%) 2.499s
[2024-11-19T09:48:42.252-0600][info ][gc,phases ] GC(2) O: Old Generation
[2024-11-19T09:48:42.498-0600][info ][gc,phases ] GC(2) O: Old Generation 8002M(49%)->8920M(54%) 0.246s
[2024-11-19T09:48:42.508-0600][info ][gc,phases ] GC(3) Y: Young Generation
[2024-11-19T09:48:47.883-0600][info ][gc,phases ] GC(3) Y: Young Generation 8962M(55%)->13304M(81%) 5.370s
[2024-11-19T09:48:47.883-0600][info ][gc,phases ] GC(3) O: Old Generation
[2024-11-19T09:48:47.883-0600][info ][gc,phases ] GC(4) y: Young Generation
[2024-11-19T09:48:52.821-0600][info ][gc,phases ] GC(4) y: Young Generation 13310M(81%)->15654M(96%) 4.926s
[2024-11-19T09:48:52.855-0600][info ][gc,phases ] GC(5) y: Young Generation
[2024-11-19T09:48:57.422-0600][info ][gc,phases ] GC(5) y: Young Generation 15890M(97%)->13096M(80%) 4.560s
[2024-11-19T09:48:57.431-0600][info ][gc,phases ] GC(3) O: Old Generation 13310M(81%)->15900M(97%) 9.540s
[2024-11-19T09:48:57.431-0600][info ][gc,phases ] GC(6) Y: Young Generation (Promote All)
[2024-11-19T09:49:02.950-0600][info ][gc,phases ] GC(6) Y: Young Generation (Promote All) 13184M(80%)->16384M(100%) 5.519s
[2024-11-19T09:49:02.950-0600][info ][gc,phases ] GC(6) Y: Young Generation (Collect Roots)
[2024-11-19T09:49:04.485-0600][info ][gc,phases ] GC(6) Y: Young Generation (Collect Roots) 16384M(100%)->12528M(76%) 1.535s
[2024-11-19T09:49:04.485-0600][info ][gc,phases ] GC(6) O: Old Generation
[2024-11-19T09:49:04.485-0600][info ][gc,phases ] GC(7) y: Young Generation
[2024-11-19T09:49:07.384-0600][info ][gc,phases ] GC(7) y: Young Generation 12550M(77%)->16384M(100%) 2.899s
[2024-11-19T09:49:07.385-0600][info ][gc,phases ] GC(8) y: Young Generation
[2024-11-19T09:49:09.226-0600][info ][gc,phases ] GC(8) y: Young Generation 16384M(100%)->15752M(96%) 1.838s
[2024-11-19T09:49:09.248-0600][info ][gc,phases ] GC(9) y: Young Generation
[2024-11-19T09:49:11.511-0600][info ][gc,phases ] GC(9) y: Young Generation 15760M(96%)->16286M(99%) 2.259s
[2024-11-19T09:49:11.515-0600][info ][gc,phases ] GC(6) O: Old Generation 12540M(77%)->16384M(100%) 7.026s
In the above log:
- major cycle GC(3) starts, and minor cycles 4 and 5 start and finish before GC(3) finishes. Because of the single forward reference, the information about GC(3) is lost
- major cycle GC(6) has 2 young collections
I'd like to take a crack at fixing this issue, but my solution will break API compatibility, so would like to get input from maintainers before proceeding.
We should change the structure of the ZGC events to report each individual collection (young/old) instead of the Major/Minor cycles. Modify the parser to hold multiple forward references (one for young, one for old, one for non-generational) and maintain their states independently of one another.
For what it's worth, I implemented a version of this solution in a private fork some months back (but didn't have time to upstream and now the repos have diverged a fair bit), so I know the general approach works.
@akiselev98 please go ahead and submit the PR! It's probably easier to look at the code and then discuss the API challenge.
I did some experimentation in this area which involved collecting the forward reference by GC id, then processing the forward reference once the "end event" was captured. I was with starting to work concurrent events. You can find this in the branch dagrieve/async-logs.