datafusion
datafusion copied to clipboard
Complete support for `Expr --> String `
Is your feature request related to a problem or challenge?
This ticket tracks the remaining work to complete https://github.com/apache/arrow-datafusion/issues/9495
As @devinjdangelo says in https://github.com/apache/arrow-datafusion/issues/9494#issuecomment-2007108672
We now have a solid foundation for converting Exprs --> SQL (see https://github.com/apache/arrow-datafusion/issues/9495 for why this is valuable).
It should now be straightforward to add support for the remaining Expr
types and doing so is a great way to get more familiar with DataFusion's core data structures and optimization algorithms without already having expertise in database internals.
Describe the solution you'd like
The basic task is to:
- Pick one (or a few) of the expressions below
- Create a PR to Implement the Expr --> AST reverse code (in this
match
statement) - Add a test (for example, here)
Here is the remaining list of exprs (just note on the ticket which you plan to work on)
- [ ] InList
- [ ] ScalarFunction
- [ ] Between
- [ ] Case
- [ ] WindowFunction
- [ ] Like
- [ ] ScalarVariable
- [ ] SimilarTo
- [ ] Not
- [ ] IsNotNull
- [ ] IsTrue
- [ ] IsFalse
- [ ] IsUnknown
- [ ] IsNotTrue
- [ ] IsNotUnknown
- [ ] Negative
- [ ] GetIndexedField
- [ ] TryCast
- [ ] Sort
- [ ] Exists
- [ ] Wildcard
- [ ] GroupingSet
- [ ] Placeholder
- [ ] OuterReferenceColumn
- [ ] Unnest
- [x] https://github.com/apache/datafusion/issues/10197
- [x] https://github.com/apache/datafusion/issues/10256
Describe alternatives you've considered
No response
Additional context
No response
take IsTrue, IsFalse, IsUnknown
I'd like to work on isNotNull, Not and Between
take ScalarFunction and InList
take Case
take Like
take Sort, Exists
I'd like to work on Negative
It looks like there are a few things not yet complete in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sql/src/unparser/expr.rs#L52
- [ ] Window functions
- [ ] user defined functions
- [ ] (maybe more)
take SimilarTo, IsNotTrue, IsNotUnknown,Negative
Here is a list that has not been done yet, for interested contributor to better view:
- [ ] GetIndexedField
- [ ] ScalarVariable
- [ ] TryCast
- [x] Sort
- [x] Exists
- [ ] Wildcard
- [ ] GroupingSet
- [ ] Placeholder
- [ ] OuterReferenceColumn
- [ ] Unnest
- [x] WindowFunction
Also Aggregate Function only supports BuiltInAggregateFunction, and it didn't use filter
field.
Maybe we should use AggregateExpressionWithFilter when there's a filter
? See: https://github.com/sqlparser-rs/sqlparser-rs/blob/e747c9c2af08f4ea12e8d1692adf95998209e2a1/src/ast/mod.rs#L643-L649
I noticed that in the sqlparser-rs
crate, OrderByExpr is not part of the sqlparser::ast::Expr
enum (which is the return type for expr_to_sql
), which is a problem when I was implementing this for Expr::Sort
.
So I wanted to get some thoughts on how best to proceed, should we make a change in sqlparser-rs
, or should we just create a new enum to encapsulate the two, or anything else.
I noticed that in the
sqlparser-rs
crate, OrderByExpr is not part of thesqlparser::ast::Expr
enum (which is the return type forexpr_to_sql
), which is a problem when I was implementing this forExpr::Sort
.So I wanted to get some thoughts on how best to proceed, should we make a change in
sqlparser-rs
, or should we just create a new enum to encapsulate the two, or anything else.
I think when converting Expr::Sort
to an sql::ast::Expr
it should just return its inner Expr (and igore the ORDER BY information)
The ORDER BY information is required when turning the expression back into an entire query I think...
But at first just converting to sql::ast::Expr
by ignoring the ORDER BY is probably correct in most cases
Thank you everyone who has contributed so far :pray:. I just filed a PR to update datafusion-federation to point upstream for plan->SQL going forward https://github.com/datafusion-contrib/datafusion-federation/pull/30. The additional expr implementations upstream here will enable datafusion-federation to handle many more queries than before.
Here is a list that has not been done yet, for interested contributor to better view:
- [ ] GetIndexedField
- [ ] ScalarVariable
- [ ] TryCast
- [x] Sort
- [x] Exists
- [ ] Wildcard
- [ ] GroupingSet
- [ ] Placeholder
- [ ] OuterReferenceColumn
- [ ] Unnest
- [ ] WindowFunction
Take ScalarVariable
if this is still needed.
Thank you very much @devanbenz -- that would be awesome. Yes in general completing this list (or determining it is not possible) is very much needed
Sounds good. I have never contributed to this project before but am interested in helping out where I can and offering my support for this project -- @alamb ~~if I needed to ask any questions should that be done in the issue thread or is there a slack
irc
or discord
where datafusion discussion takes place?~~ disregard I see this in the README: https://arrow.apache.org/datafusion/contributor-guide/communication.html :)
Hi @devanbenz , I think ScalarVariable
has not been done yet and you could have a try, though it may not be possible.🤔
I think maybe it should be converted to ScalarValue
first, then call scalar_to_sql
function.
Here are some thoughts to help people figure out what SQL matches to what expression:
-
GetIndexedField
--> `SELECT col['field_name'] - ScalarVariable --> Not sure (maybe
@@foo@@?
- TryCast * -->
CAST (COL AS <datatype>>
- Sort -->
col ASC
orcol DESC NULLS FIRST
- Exists -->
col exists (SELECT 1 from foo)
- Wildcard -->
COUNT(*)
- GroupingSet --> Not sure
- Placeholder -->
SELECT $1
- OuterReferenceColumn --> not sure (I think this is the same as a column)
- Unnest -->
UNNEST(SELECT col FROM bar)
- WindowFunction -->
first_value(x ORDER BY y PARTITION BY z)
Hi @devanbenz , I think
ScalarVariable
has not been done yet and you could have a try, though it may not be possible.🤔 I think maybe it should be converted toScalarValue
first, then callscalar_to_sql
function.
Taking a look at this today. Will give it a try and update this thread with any thoughts, observations, or problems I run in to :)
Here are some thoughts to help people figure out what SQL matches to what expression:
GetIndexedField
--> `SELECT col['field_name']- ScalarVariable --> Not sure (maybe
@@foo@@?
- TryCast * -->
CAST (COL AS <datatype>>
- Sort -->
col ASC
orcol DESC NULLS FIRST
- Exists -->
col exists (SELECT 1 from foo)
- Wildcard -->
COUNT(*)
- GroupingSet --> Not sure
- Placeholder -->
SELECT $1
- OuterReferenceColumn --> not sure (I think this is the same as a column)
- Unnest -->
UNNEST(SELECT col FROM bar)
- WindowFunction -->
first_value(x ORDER BY y PARTITION BY z)
@alamb I beleive ScalarVariable
would just be something like @variable
.
Thanks @devanbenz !
@alamb if you could please take a look at https://github.com/sqlparser-rs/sqlparser-rs/pull/1260 -- it's linked to this issue :)
@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)
Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )
@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)
Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )
All good - this is for the Unparser
expr_to_sql
match statement https://github.com/apache/datafusion/blob/89443bff20015ef6c7646ca32754ece8c4093910/datafusion/sql/src/unparser/expr.rs#L391 I need to have a supported ast::Expr type for ScalarVariable from sql-parser. 👍
It seems to me that Expr::ScalarVariable
currently is parsed from ast::Identifier
(that starts with @
)-- perhaps instead of adding a new ast node, you could just create an equivalent ast::Identifier
🤔
https://github.com/apache/datafusion/blob/accce9732e26723cab2ffc521edbf5a3fe7460b3/datafusion/sql/src/expr/identifier.rs#L28-L43
It seems to me that
Expr::ScalarVariable
currently is parsed fromast::Identifier
(that starts with@
)-- perhaps instead of adding a new ast node, you could just create an equivalentast::Identifier
🤔https://github.com/apache/datafusion/blob/accce9732e26723cab2ffc521edbf5a3fe7460b3/datafusion/sql/src/expr/identifier.rs#L28-L43
Sounds good - I'll have some time to come back to this in the next couple days! Will modify per your suggestion :)
I filed tickets for the remaining issues
These ones are likely straightforward.
- [ ] https://github.com/apache/datafusion/issues/10518
- [ ] https://github.com/apache/datafusion/issues/10519
- [ ] #10522
The others are likely more involved as I don't know how to make a reproducer
With the completion of https://github.com/apache/datafusion/pull/10555 from @xinlifoobar I think this epic is now done!