pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Use ArrayList instead of LinkedList in SortOperator

Open gortiz opened this issue 1 year ago • 3 comments

A small PR that changes SortOperator to buffer entries in an ArrayList instead of a LinkedList.

In general LinkedList performance is horrible, even in cases when theoretically (by Big-O) they are fine, usually the performance cost is worse than ArrayList due to memory amplification and cache issues. In the specific case this PR is changing, there was no actual reason to use LinkedList apart from a slightly nicer code. But given the size of the final result is well know, it was very easy to change it to an ArrayList implementation where the ArrayList is initialized to all values being null and then set values in the reverse direction. Amortized BigO cost is still linear, but locallity and allocation should be quite better in this implementation.

A secondary but difficult to prove improvement is related to megamorphic calls. As we mostly always use ArrayList in our code, the JIT can generate code that assumes that. Sometimes we need to use different lists, but if we can avoid that we theoretically can produce better code.

Probably the performance cost will be negligible in most cases and some of the actual cases (like the megamorphic calls) cannot be easilly benchmarked with JMH, so I'm not adding these benchmarks in this PR.

gortiz avatar Apr 03 '24 10:04 gortiz

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 62.16%. Comparing base (59551e4) to head (153c3d6). Report is 2379 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12783      +/-   ##
============================================
+ Coverage     61.75%   62.16%   +0.41%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2502      +66     
  Lines        133233   136586    +3353     
  Branches      20636    21145     +509     
============================================
+ Hits          82274    84908    +2634     
- Misses        44911    45401     +490     
- Partials       6048     6277     +229     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration <0.01% <0.00%> (-0.01%) :arrow_down:
integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration2 0.00% <0.00%> (ø)
java-11 62.12% <100.00%> (+0.41%) :arrow_up:
java-21 62.04% <100.00%> (+0.42%) :arrow_up:
skip-bytebuffers-false 62.15% <100.00%> (+0.40%) :arrow_up:
skip-bytebuffers-true 62.02% <100.00%> (+34.30%) :arrow_up:
temurin 62.16% <100.00%> (+0.41%) :arrow_up:
unittests 62.16% <100.00%> (+0.41%) :arrow_up:
unittests1 46.71% <100.00%> (-0.18%) :arrow_down:
unittests2 27.94% <0.00%> (+0.21%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Apr 03 '24 10:04 codecov-commenter

Theoretically, I am aligned on leveraging ArrayList for memory efficiency and cache friendliness if we know size upfront.

Are you aware of any tools that can be used to test cache improvements ?

Have you done some tests for the memory part via VisualVM / Yourkit ?

PS - I am ok with the PR. Just curious in general about tools that can be used to test such PRs in future.

siddharthteotia avatar Apr 04 '24 05:04 siddharthteotia

Are you aware of any tools that can be used to test cache improvements ?

We can easily create a JMH benchmark that tests list vs array list in this kind of code. I think it is not worth to create a JMH benchmark that test the operator itself because it would be quite complex.

I'm not aware of other tools, but each linked list node contains a reference to its value and a reference to the next node in the link. That means we need at least 8 bytes (assuming compressed pointers) for each node. In theory these nodes may be spread in the heap, although my experience tell me that they are almost always to be one after the other in the heap because we allocate them by the same thread with no allocations in the middle. Future GCs may move them, but sounds pretty unlikely.

While LinkedList requires 8 bytes per element that will probably but not for sure be contiguous in memory, Array list requires just 4 bytes per element that are going to be contiguous in memory for sure. Therefore cache usage is always better with ArrayList than with LinkedList.

gortiz avatar Apr 04 '24 16:04 gortiz

I've applied the suggested changes. Please take a look.

gortiz avatar Apr 25 '24 08:04 gortiz