pinot
pinot copied to clipboard
Use ArrayList instead of LinkedList in SortOperator
A small PR that changes SortOperator to buffer entries in an ArrayList instead of a LinkedList.
In general LinkedList performance is horrible, even in cases when theoretically (by Big-O) they are fine, usually the performance cost is worse than ArrayList due to memory amplification and cache issues. In the specific case this PR is changing, there was no actual reason to use LinkedList apart from a slightly nicer code. But given the size of the final result is well know, it was very easy to change it to an ArrayList implementation where the ArrayList is initialized to all values being null and then set values in the reverse direction. Amortized BigO cost is still linear, but locallity and allocation should be quite better in this implementation.
A secondary but difficult to prove improvement is related to megamorphic calls. As we mostly always use ArrayList in our code, the JIT can generate code that assumes that. Sometimes we need to use different lists, but if we can avoid that we theoretically can produce better code.
Probably the performance cost will be negligible in most cases and some of the actual cases (like the megamorphic calls) cannot be easilly benchmarked with JMH, so I'm not adding these benchmarks in this PR.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 62.16%. Comparing base (
59551e4) to head (153c3d6). Report is 2379 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #12783 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2502 +66
Lines 133233 136586 +3353
Branches 20636 21145 +509
============================================
+ Hits 82274 84908 +2634
- Misses 44911 45401 +490
- Partials 6048 6277 +229
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration2 | 0.00% <0.00%> (ø) |
|
| java-11 | 62.12% <100.00%> (+0.41%) |
:arrow_up: |
| java-21 | 62.04% <100.00%> (+0.42%) |
:arrow_up: |
| skip-bytebuffers-false | 62.15% <100.00%> (+0.40%) |
:arrow_up: |
| skip-bytebuffers-true | 62.02% <100.00%> (+34.30%) |
:arrow_up: |
| temurin | 62.16% <100.00%> (+0.41%) |
:arrow_up: |
| unittests | 62.16% <100.00%> (+0.41%) |
:arrow_up: |
| unittests1 | 46.71% <100.00%> (-0.18%) |
:arrow_down: |
| unittests2 | 27.94% <0.00%> (+0.21%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Theoretically, I am aligned on leveraging ArrayList for memory efficiency and cache friendliness if we know size upfront.
Are you aware of any tools that can be used to test cache improvements ?
Have you done some tests for the memory part via VisualVM / Yourkit ?
PS - I am ok with the PR. Just curious in general about tools that can be used to test such PRs in future.
Are you aware of any tools that can be used to test cache improvements ?
We can easily create a JMH benchmark that tests list vs array list in this kind of code. I think it is not worth to create a JMH benchmark that test the operator itself because it would be quite complex.
I'm not aware of other tools, but each linked list node contains a reference to its value and a reference to the next node in the link. That means we need at least 8 bytes (assuming compressed pointers) for each node. In theory these nodes may be spread in the heap, although my experience tell me that they are almost always to be one after the other in the heap because we allocate them by the same thread with no allocations in the middle. Future GCs may move them, but sounds pretty unlikely.
While LinkedList requires 8 bytes per element that will probably but not for sure be contiguous in memory, Array list requires just 4 bytes per element that are going to be contiguous in memory for sure. Therefore cache usage is always better with ArrayList than with LinkedList.
I've applied the suggested changes. Please take a look.