Nick Karlov comments

Results 19 comments of


                                            Nick Karlov

[EPIC] (Even More) Grouping / Group By / Aggregation Performance

> Looking at the trace in @alamb I'd like to mention, that extending of mutable batch spends a lot of time (MutableArrayData::Extend, utils::extend_offsets) and related allocator's work. I suppose that...

[EPIC] (Even More) Grouping / Group By / Aggregation Performance

> Here is one idea for doing so: #9403 I thought over a join issue in case when left table may be not columnar. For instance let's consider `Events` and...

[EPIC] (Even More) Grouping / Group By / Aggregation Performance

> I wonder if we could combine this with something like #7955 🤔 It's quite a good idea! But I think it's a tricky to push ON condition down. The...

[EPIC] (Even More) Grouping / Group By / Aggregation Performance

> DictionaryArray DictionaryArray is something different. It is the best choice for low cardinality columns (now to efficiently encode data in a single column to save space and increase performance...

Use btree to search fields in DFSchema

> Made a [benchmark](https://github.com/apache/arrow-datafusion/pull/7948). > > ## Baseline - Data Fusion 32 ([a0c5aff](https://github.com/apache/arrow-datafusion/commit/a0c5affca271d67980286cb2ae08ea8eec75a326)) > ``` > index_of_column_by_name 10 > time: [11.323 ns 11.325 ns 11.328 ns] > change: [-0.0714% +0.3045%...

Use btree to search fields in DFSchema

> Thank you -- I plan to review this more carefully tomorrow @alamb I think it's a good idea to introduce user defined cacheprovider for both DFSchema and arrow Schema....

[Epic] A collection of issues to improve planning performance / speed / efficiency

Also I'd like to consider replace list in DFSchema by [case_insensitive_hashmap](https://docs.rs/case_insensitive_hashmap/latest/case_insensitive_hashmap/) or something similar in order to get value with O(1) complexity instead of O(N). As I understand, now complexity...

[Epic] A collection of issues to improve planning performance / speed / efficiency

@alamb Hi, amazing work have been done! It's became much more speedy. But it seems that the complexity of algorithms is still O(n^2) Here we have graph avg query execution...

Add push down sort to the source (table provider)

> Possibly related: #7871 @alamb Thank you for the reply! I've read discussion in #7871 and think that this case is different. I don't want to say that MySourceExec can...