Support arbitrary expressions in `LIMIT` clause
Follow on to https://github.com/apache/arrow-datafusion/issues/9506
The idea is to support arbitrary expressions that can be consolidated to a constant in the LIMIT clause. For example
❯ select * from (values (1)) LIMIT 10*100;
Error during planning: Unsupported operator for LIMIT clause
This query should be able to run (and return the single value)
❯ select * from (values (1)) LIMIT 10+100;
+---------+
| column1 |
+---------+
| 1 |
+---------+
https://github.com/apache/arrow-datafusion/pull/9790 adds support for basic +/- but the general purpose solution that would handle any expr that can be consolidated to a constant would be better
As suggested by @jonahgao this might look like change the Limit logical plan to support arbitrary expressions?
pub struct Limit {
pub skip: Expr,
pub fetch: Option<Expr>,
pub input: Arc<LogicalPlan>,
}
The SimplifyExpressions rule can automatically optimize them into constants. Some optimization rules such as PushDownLimit only run when the limit expression is a constant. We may need to add a cast for the limit expression when planning, only checking if it is a constant of type u64.
When creating the LimitExec physical plan, convert the limit expression into PhysicalExpr and evaluate it.
Originally posted by @jonahgao in https://github.com/apache/arrow-datafusion/pull/9790#discussion_r1539358701
I can do this one since I have been doing a related limit pushdown feature.
wait until https://github.com/apache/arrow-datafusion/pull/9815#issue-2209595815 merged
This one is probably ready to work on now
While re-reading this I think we should start with implementhing limits that can be evaluated to a constant by the time the physical plan is created (aka don't change the physical execution plans)
It is not clear to me what LIMIT 100 + x means
The key usecases are:
LIMIT $1-- parameters like https://github.com/apache/datafusion/issues/12294LIMIT 8 * 1024-- expressions that evaluate to constants
While re-reading this I think we should start with implementhing limits that can be evaluated to a constant by the time the physical plan is created (aka don't change the physical execution plans)
I will switch my focus to working on it.