BInwei Yang comments

Results 167 comments of


                                            BInwei Yang

[VL] Add 3 configs of spill

To bypass the result mismatch issue of https://github.com/facebookincubator/velox/issues/9219, you may set a very large MaxSpillRunRows and maxSpillFileSize

Add RowsStreamingWindowBuild to avoid OOM in Window operator

@mbasmanova @aditi-pandit Do you have more comments? The PR is to fix the issue of TPCDS Q67.

Support complex types in sparksql hash and xxhash64 function

@mbasmanova Any more comments? The function is used by Gluten columnar shuffle.

Support complex types in sparksql hash and xxhash64 function

> @mbasmanova @pedroerp I created a benchmark in [9265b97](https://github.com/facebookincubator/velox/commit/9265b975152586ebdda5576314133a3c24120f53) > > Benchmark Result at commit [569500d](https://github.com/facebookincubator/velox/commit/569500df1ffb26642c44233af75941561cd10c13) > > ``` > ============================================================================ > [...]hmarks/ExpressionBenchmarkBuilder.cpp relative time/iter iters/s > ============================================================================ > hash_ARRAY##hash...

Support complex types in sparksql hash and xxhash64 function

> > I didn't go through your code. If I understand correctly, it's the comparison between an indirect branch prediction vs. direct branch prediction. > > If the prediction is...

Support complex types in sparksql hash and xxhash64 function

@mbasmanova , @marin-ma collected below data. conclusion is to use the virtual function call. Virtual function call has 15% better performance. The root cause is it has 20% less instructions....

Support complex types in sparksql hash and xxhash64 function

@marin-ma Let's use virtual function way for now. Can you update the PR?

Support complex types in sparksql hash and xxhash64 function

> virtual/switch > Instruction per loop 14,714 18,584 0.792 > IPC 2.54 2.78 > loop per second 533,513 462,064 1.155 > branch misprediction ratio 2.8% 2.0% > branch misprediction/1K inst...

Add RetryStrategy for S3 file system

@yma11 When the call client_->HeadObject or client_->GetObject fail, can we get the the retry number from outcome? If so, let's print the number in error message. So user can know...

Summary of Parquet reader Issues

Thank you! It's what we need. @yma11 can add more tests result from parquet-mr later.