datafusion
datafusion copied to clipboard
Support limit in StreamingTableExec
Which issue does this PR close?
Closes todo.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Seems due to filter, limit won't be pushed down to table provider. So I added a simple test in limit.slt .
FYI @metesynnada
@alamb, this looks good to me. However, I'm concerned there might be redundant LimitStream
overhead when both GlobalLimitExec
and StreamTableExec
are present in the plan, as LimitStream
appears to be used twice.
Additionally, it would be better to use unit tests addition to slt
tests, given that GlobalLimitExec
already ensures the tests function properly even if StreamTableExec
fails.
@alamb, this looks good to me. However, I'm concerned there might be redundant
LimitStream
overhead when bothGlobalLimitExec
andStreamTableExec
are present in the plan, asLimitStream
appears to be used twice.Additionally, it would be better to use unit tests addition to
slt
tests, given thatGlobalLimitExec
already ensures the tests function properly even ifStreamTableExec
fails.
Those are excellent points. @lewiszlw are you able to add the unit tests? Otherwise I can try and find some time to do so
Hmm, I don't get the point. How should I add unit tests?
Hmm, I don't get the point. How should I add unit tests?
The executor should be capable of running independently within a test. For more insight, refer to the hash join tests in datafusion/physical-plan/src/joins/hash_join.rs. Your tests ofc will be much more simple.
Currently, a GlobalLimitExec
is already implemented at the top of the existing tests, which means these tests wouldn't fail if the limit implementation in StreamingTableExec were to change.
I think the idea would be to create a StreamingTableExec
directly and then read data from it, ensuring that the limit was applied
I didn't see any existing unit tests for StreamingTableExec
so we would have to write some I think
https://github.com/apache/datafusion/blob/fa31c781d2cb0bdfba06dcc07bc75d9a5f9686b2/datafusion/physical-plan/src/streaming.rs#L221
Maybe I missed it
If you don't have time to do this @lewiszlw I can try and help, or perhaps we can file a ticket and do it as a follow on PR
If you don't have time to do this @lewiszlw I can try and help, or perhaps we can file a ticket and do it as a follow on PR
Yeah, I'm on vacation. So might not action in time.
If you don't have time to do this @lewiszlw I can try and help, or perhaps we can file a ticket and do it as a follow on PR
Yeah, I'm on vacation. So might not action in time.
No problem -- I am giving it a shot
I pushed a unit test in 30bf3bf
Thanks again @lewiszlw and @metesynnada