Unbounded SortExec (and Top-K) Implementation When Req's Are Satisfied
Which issue does this PR close?
Closes #.
Rationale for this change
SortExec (with or w/out fetch) can work without an actual sort if the existing input order is required.
What changes are included in this PR?
- When we create a new SortExec, its execution mode is set according to its input order.
- Adding a fetch can update its execution mode.
- During execution, we consult the information of matching the requirements and having a fetch. If we already satisfy the requirements, SortExec either short-circuits itself (if it does not have a fetch), or behaves as a limit operator (if it has a fetch).
Are these changes tested?
Yes. I cannot practice the newly added stream types via an .slt test, but a unit test is added.
Are there any user-facing changes?
But in this case I would have expected the Sort to have been removed by one of the optimizer passes rather than the SortExec implementing a limit. 🤔
Right. The sort will actually be entirely removed in later passes. However, in the meantime (before its removal), any rule looking at the execution mode will still think this TopK is pipeline-breaking (while it is not), which results in them behaving incorrectly. We caught this behavior downstream in the context of a custom rule.
Basically, this PR does two things:
- It makes the
execution_modeAPI give a correct result when data is already sorted, so intermediate rules that look at execution modes do not falsely think that the pipeline breaks somewhere. - It makes TopK's execution logic behave like a limit if it somehow ends up in the plan without being removed. This is not possible in upstream DataFusion (sans bugs), but people using custom optimizers can end up in such a state. In such cases, it will also execute more efficiently.
Thanks for taking another look -- incorporated your feedback to leverage LimitStream
I will go ahead and merge this soon since it is a small, localized change that improves corner cases without interfering with the main use case of SortExec.
If there are any lingering concerns, we will address with a quick follow-on PR
Thank you for responding to the feedback @berkaysynnada and @ozankabak -- sorry for my delay -- I have been out. This PR now looks good to me