Andy Grove comments

Results 657 comments of


                                            Andy Grove

Performance regression after adding support for SMJ with join filter

Disabling sortMergeJoin via configs restores the original performance.

Performance regression after adding support for SMJ with join filter

I ran again with latest from main (0033), and then with SMJ + join filter disabled manually (0034). Here are the event logs. [app-20240904131653-0033.gz](https://github.com/user-attachments/files/16877346/app-20240904131653-0033.gz) [app-20240904132048-0034.gz](https://github.com/user-attachments/files/16877355/app-20240904132048-0034.gz)

Performance regression after adding support for SMJ with join filter

Here is a screenshot comparing the plans with SMJ+filter enabled on the left and disabled on the right. ![Screenshot from 2024-09-04 13-34-37](https://github.com/user-attachments/assets/3ea2b96f-767d-4fda-b627-77e083b8a51f)

Optimize filters to remove redundant IsNotNull checks

The `Display` implementation for `ScalarValue` changed between DataFusion 37 (the version that Ballista is using) and the version that Comet version. In the older version, Date32 is shown as an...

Optimize filters to remove redundant IsNotNull checks

I tested a prototype of optimizing this filter and saw a 7% improvement in filter time for this query. It seems worth implementing.

Optimize filters to remove redundant IsNotNull checks

> This might work ok for tpc-h but tpc-ds data has nulls and the null check is required perhaps? Does ballista know about the nullability of the data? Yes, the...

Implement Common Subexpression Elimination optimizer rule

Related to this, it would be nice if we could improve the metrics for CometHashAggregate to show the time for evaluating the aggregate input expressions. I am not sure how...

Implement Common Subexpression Elimination optimizer rule

> Good finding. I think this kind of optimization should be in Spark optimizer instead. It would make sense for Spark to add this, but I think that it could...

Implement Common Subexpression Elimination optimizer rule

There is now a DataFusion PR to add this feature: https://github.com/apache/datafusion/pull/13046

Implement Common Subexpression Elimination optimizer rule

The DataFusion PR https://github.com/apache/datafusion/pull/13046 is still waiting for a review. I am adding this issue back onto the 0.6 milestone as a reminder.