Andy Grove

Results 657 comments of Andy Grove

I tried running benchmarks with this PR but ran into: ``` Failed to allocate additional 917708800 bytes for ShuffleRepartitioner[0] with 0 bytes already allocated for this reservation - 858914816 bytes...

@Kontinuation Is this the general approach you were suggesting?

I was able to get benchmarks running by allocating more memory to Comet.

Thanks for the detailed feedback @Kontinuation. I plan to resume work on this today/tomorrow.

@Kontinuation Do you want to create a PR from your branch? I like the idea of having some different configurable options while we are experimenting with this

Closing in favor of https://github.com/apache/datafusion-comet/pull/1021

Here is a teaser for the performance improvement. This is for TPC-H q11 (SF=100) with broadcast joins disabled (I am looking into a regression with those). I ran the query...

Current benchmarks: ![tpch_allqueries](https://github.com/user-attachments/assets/36950a1b-40e0-46db-a476-287cfbd59909) Speedup of using HashJoin instead of SortMergeJoin: ![tpch_queries_speedup](https://github.com/user-attachments/assets/2344a59a-1285-4104-b3d6-d86a13ef3995)

I will add documentation to this PR today, explaining pros/cons of this feature in our tuning guide.

@viirya @parthchandra This is now ready for review. The new option is disabled by default and I added a section to the tuning guide explaining why users may want to...