Mustafa Akur
Mustafa Akur
I think, combining consecutive ``` RepartitionExec(hash) --RepartitionExec(round robin) ``` into ``` RepartitionExec(hash) (where inputs hashed in parallel) ``` produces much more readable plans. And I presume it would be better...
> @mustafasrepo just to confirm, you think the first step would be to have a single RepartitionExec in the section of the query plan where we have two, is that...
As you say `RepartitionExec::pull_from_input` is `async`. However, it opens a work for each input partition. Hence, when the plan contains `RepartitionExec: partitioning=Hash([exprs, ...], 8), input_partitions=1`. It will open a single...
> @mustafasrepo when writing `split_input_data` I haven't found a way to apply round robin without consuming the stream, is this expected? totally expected (at least to me)
> Still work in progress, @mustafasrepo you recommended the `split` to return a `Vec`, however once I have a `RecordBatch` I got lost a bit. Should that part be implemented...
I think having dedicated config setting is more verbose and clear (as in `prefer_existing_union`). If we were to use `prefer_existing_sort` that might also work. However, if the condition to replace...
When I execute this query. I recognized, that sometimes it works, sometimes it fails. It seems like a strange bug.
@comphead, I debugged this behaviour also. It seems that during concatenation in the `BoundedWindowAggExec`. We were effectively using the `schema` of the first batch in the partition (The schema of...
> > In this case, should we add a function-level optimizer trying to rewrite the function? I think it would be a large match expression. I have done a similar...
When I run the queries as you described in the issue body. I got the following plan: ``` logical_plan TableScan: t projection=[x], unsupported_filters=[t.y > Int64(0)] physical_plan MemoryExec: partitions=1, partition_sizes=[1] ```...