Andy Grove
Andy Grove
I tried running benchmarks with this PR but ran into: ``` Failed to allocate additional 917708800 bytes for ShuffleRepartitioner[0] with 0 bytes already allocated for this reservation - 858914816 bytes...
@Kontinuation Is this the general approach you were suggesting?
I was able to get benchmarks running by allocating more memory to Comet.
Thanks for the detailed feedback @Kontinuation. I plan to resume work on this today/tomorrow.
@Kontinuation Do you want to create a PR from your branch? I like the idea of having some different configurable options while we are experimenting with this
Closing in favor of https://github.com/apache/datafusion-comet/pull/1021
Here is a teaser for the performance improvement. This is for TPC-H q11 (SF=100) with broadcast joins disabled (I am looking into a regression with those). I ran the query...
Current benchmarks:  Speedup of using HashJoin instead of SortMergeJoin: 
I will add documentation to this PR today, explaining pros/cons of this feature in our tuning guide.
@viirya @parthchandra This is now ready for review. The new option is disabled by default and I added a section to the tuning guide explaining why users may want to...