Daniël Heres
Daniël Heres
I think @Tmonster runs them once in a while
Thanks @Tmonster we probably make a PR soon for DataFusion 42, as many performance improvements were added compared to 41 and DataFusion will perform better for aggregations.
Is the issue that we can remove tasks/partitions/stages that are empty or is it a bug?
FYI @alamb @houqp
Hi @jackwener . The idea is that any expression within DataFusion receives a name, so the nodes in a LogicalPlan use the `NamedExpr` type. The name in this type is...
In what situations would these changes lead to better performance? I.e. why is query 28 28: ~ 1.10x faster?
(Or is it just benchmark noise?)
It would be worthwhile to run the `clickbench_extended` benchmarks as well (`./bench.sh run clickbench_extended`)
@mingmwang I think for broadcasting exchange the same thing applies as normal exchanges, they are spilled to disk by default and might be maintained in memory if memory budget allows....
That's a good observation @mingmwang ! The difference with CollectLeft is that that mode collects the left side to one partition, whereas with broadcast we would broadcast the output of...