datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

fix: Optimize not to call getNullCount as much as possible

Open kazuyukitanimura opened this issue 1 year ago • 10 comments

Which issue does this PR close?

Rationale for this change

This PR reduces to call getNullCount as much as possible

What changes are included in this PR?

Calling getNullCount is expensive so reusing the number rather than re-calculating

How are these changes tested?

Existing test

kazuyukitanimura avatar Aug 13 '24 04:08 kazuyukitanimura

Before

Screenshot 2024-08-13 at 3 22 53 PM

After

Screenshot 2024-08-13 at 3 22 38 PM

kazuyukitanimura avatar Aug 13 '24 22:08 kazuyukitanimura

Codecov Report

Attention: Patch coverage is 5.22388% with 127 lines in your changes missing coverage. Please review.

Project coverage is 54.02%. Comparing base (4fe43ad) to head (a773832). Report is 9 commits behind head on main.

Files Patch % Lines
...in/java/org/apache/arrow/c/CometArrayExporter.java 0.00% 54 Missing :warning:
...n/java/org/apache/comet/vector/CometMapVector.java 0.00% 16 Missing :warning:
...main/java/org/apache/comet/vector/CometVector.java 0.00% 13 Missing :warning:
...ava/org/apache/comet/vector/CometStructVector.java 0.00% 9 Missing :warning:
...ain/scala/org/apache/comet/vector/NativeUtil.scala 0.00% 8 Missing :warning:
.../java/org/apache/comet/vector/CometLazyVector.java 0.00% 6 Missing :warning:
...in/java/org/apache/comet/parquet/ColumnReader.java 0.00% 4 Missing :warning:
.../src/main/java/org/apache/comet/parquet/Utils.java 20.00% 4 Missing :warning:
...org/apache/comet/vector/CometDictionaryVector.java 0.00% 4 Missing :warning:
.../java/org/apache/comet/vector/CometListVector.java 0.00% 3 Missing :warning:
... and 4 more
Additional details and impacted files
@@              Coverage Diff              @@
##               main     #820       +/-   ##
=============================================
+ Coverage     33.94%   54.02%   +20.07%     
+ Complexity      874      854       -20     
=============================================
  Files           112      110        -2     
  Lines         42916    10559    -32357     
  Branches       9464     2000     -7464     
=============================================
- Hits          14567     5704     -8863     
+ Misses        25379     3857    -21522     
+ Partials       2970      998     -1972     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 14 '24 01:08 codecov-commenter

@andygrove @comphead @huaxingao @parthchandra @viirya

kazuyukitanimura avatar Aug 14 '24 02:08 kazuyukitanimura

Somehow Run codecov/codecov-action@v3 keeps failing, seems not related but not sure why

kazuyukitanimura avatar Aug 14 '24 19:08 kazuyukitanimura

I am now running benchmarks with this PR

andygrove avatar Aug 15 '24 21:08 andygrove

I ran my local TPC-DS benchmark and it doesn't show any improvement for that benchmark.

tpcds_allqueries (4)

@kazuyukitanimura Do you see an improvement with any of the microbenchmark queries?

andygrove avatar Aug 15 '24 22:08 andygrove

Thanks @andygrove hmmm how many iteration was used for your local benchmark?

I used q27. I will try to run some microbenchmarks to showcase...

kazuyukitanimura avatar Aug 15 '24 23:08 kazuyukitanimura

Thanks @andygrove hmmm how many iteration was used for your local benchmark?

I used q27. I will try to run some microbenchmarks to showcase...

This was the average of 3 runs for all 99 TPC-DS queries @ 100 GB scale factor

andygrove avatar Aug 16 '24 00:08 andygrove

@andygrove I ran q27 30 times. The improvement is not as big as I hoped but it is improving

Before

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.6.1
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q27: Comet (Scan, Exec)                            4001           4183          98         72.5          13.8       1.0X

After

OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Mac OS X 14.6.1
Apple M1 Max
TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
q27: Comet (Scan, Exec)                            3847           4104         122         75.4          13.3       1.0X

kazuyukitanimura avatar Aug 16 '24 23:08 kazuyukitanimura

Thanks, I realized there might be a better way to do this. Please hold merging.

kazuyukitanimura avatar Aug 19 '24 19:08 kazuyukitanimura

opened https://github.com/apache/datafusion-comet/pull/1055 instead

kazuyukitanimura avatar Nov 05 '24 01:11 kazuyukitanimura