Nick Karlov
Nick Karlov
@alamb , thank you for reply! I will continue posting about bottlenecks in DF (for instance I've noticed degradation DF performance due to aggressive concurrency in tokio scheduller and workarounded...
> various ways to make DataFusion's planing faster Also it's good to consider implementing prepared physical plans (with parametrization) it will add an ability to cache them
@alamb take a look at the PR https://github.com/apache/arrow-datafusion/pull/7870 please, where @oleggator has implemented BTree instead of list. It's improved physical plan construction x2 times
> it is still far from optimal I think it's a good idea to cache instances of DFSchema (and Arrow Schema as well). Tho most flexible way is to implement...
Another thought is to use cache of physical plan (I tested serialized into protobuf optimized physical plan as a cache and it leads to increasing of performance dramatically)
@alamb Hi! Could you please let us know if any work is planned here? We noticed that performance of DaraFusion in case of wide tables slow down significantly from version...
@alamb we tested the same perf test on 37.1 and it seems that now 99% of request time is spent on planning and optimizing (creating and optimizing of logical plan,...
Thank you for your reply @alamb! We'll check it on 38 and share results. This particular example is synthetical as we implemented it using pure memory tables without any external...
Hi! There is great job done here! I faced with an issues with CoalesceBatches: it seams that there is a performance killer somewhere in CoalesceBatchesStream. It's spending too much time...
Another topic related issue is performance of **RowConverter** used for grouping. More than 75% of GroupedHashAggregateStream work is converting composite aggregation key to row Apprx 50% of GroupedHashAggregateStream work is...