datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Complete support for `Expr --> String `

Open alamb opened this issue 11 months ago • 27 comments

Is your feature request related to a problem or challenge?

This ticket tracks the remaining work to complete https://github.com/apache/arrow-datafusion/issues/9495

As @devinjdangelo says in https://github.com/apache/arrow-datafusion/issues/9494#issuecomment-2007108672

We now have a solid foundation for converting Exprs --> SQL (see https://github.com/apache/arrow-datafusion/issues/9495 for why this is valuable).

It should now be straightforward to add support for the remaining Expr types and doing so is a great way to get more familiar with DataFusion's core data structures and optimization algorithms without already having expertise in database internals.

Describe the solution you'd like

The basic task is to:

  1. Pick one (or a few) of the expressions below
  2. Create a PR to Implement the Expr --> AST reverse code (in this match statement)
  3. Add a test (for example, here)

Here is the remaining list of exprs (just note on the ticket which you plan to work on)

  • [ ] InList
  • [ ] ScalarFunction
  • [ ] Between
  • [ ] Case
  • [ ] WindowFunction
  • [ ] Like
  • [ ] ScalarVariable
  • [ ] SimilarTo
  • [ ] Not
  • [ ] IsNotNull
  • [ ] IsTrue
  • [ ] IsFalse
  • [ ] IsUnknown
  • [ ] IsNotTrue
  • [ ] IsNotUnknown
  • [ ] Negative
  • [ ] GetIndexedField
  • [ ] TryCast
  • [ ] Sort
  • [ ] Exists
  • [ ] Wildcard
  • [ ] GroupingSet
  • [ ] Placeholder
  • [ ] OuterReferenceColumn
  • [ ] Unnest
  • [x] https://github.com/apache/datafusion/issues/10197
  • [x] https://github.com/apache/datafusion/issues/10256

Describe alternatives you've considered

No response

Additional context

No response

alamb avatar Mar 21 '24 17:03 alamb

take IsTrue, IsFalse, IsUnknown

Lordworms avatar Mar 22 '24 01:03 Lordworms

I'd like to work on isNotNull, Not and Between

sebastian2296 avatar Mar 22 '24 12:03 sebastian2296

take ScalarFunction and InList

yyy1000 avatar Mar 23 '24 22:03 yyy1000

take Case

yyy1000 avatar Mar 25 '24 18:03 yyy1000

take Like

Weijun-H avatar Mar 26 '24 03:03 Weijun-H

take Sort, Exists

kevinmingtarja avatar Mar 26 '24 05:03 kevinmingtarja

I'd like to work on Negative

pratikpugalia avatar Mar 26 '24 22:03 pratikpugalia

It looks like there are a few things not yet complete in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sql/src/unparser/expr.rs#L52

  • [ ] Window functions
  • [ ] user defined functions
  • [ ] (maybe more)

alamb avatar Apr 01 '24 16:04 alamb

take SimilarTo, IsNotTrue, IsNotUnknown,Negative

yyy1000 avatar Apr 01 '24 18:04 yyy1000

Here is a list that has not been done yet, for interested contributor to better view:

  • [ ] GetIndexedField
  • [ ] ScalarVariable
  • [ ] TryCast
  • [x] Sort
  • [x] Exists
  • [ ] Wildcard
  • [ ] GroupingSet
  • [ ] Placeholder
  • [ ] OuterReferenceColumn
  • [ ] Unnest
  • [x] WindowFunction

yyy1000 avatar Apr 01 '24 18:04 yyy1000

Also Aggregate Function only supports BuiltInAggregateFunction, and it didn't use filter field. Maybe we should use AggregateExpressionWithFilter when there's a filter? See: https://github.com/sqlparser-rs/sqlparser-rs/blob/e747c9c2af08f4ea12e8d1692adf95998209e2a1/src/ast/mod.rs#L643-L649

yyy1000 avatar Apr 01 '24 18:04 yyy1000

I noticed that in the sqlparser-rs crate, OrderByExpr is not part of the sqlparser::ast::Expr enum (which is the return type for expr_to_sql), which is a problem when I was implementing this for Expr::Sort.

So I wanted to get some thoughts on how best to proceed, should we make a change in sqlparser-rs, or should we just create a new enum to encapsulate the two, or anything else.

kevinmingtarja avatar Apr 02 '24 18:04 kevinmingtarja

I noticed that in the sqlparser-rs crate, OrderByExpr is not part of the sqlparser::ast::Expr enum (which is the return type for expr_to_sql), which is a problem when I was implementing this for Expr::Sort.

So I wanted to get some thoughts on how best to proceed, should we make a change in sqlparser-rs, or should we just create a new enum to encapsulate the two, or anything else.

I think when converting Expr::Sort to an sql::ast::Expr it should just return its inner Expr (and igore the ORDER BY information)

The ORDER BY information is required when turning the expression back into an entire query I think...

But at first just converting to sql::ast::Expr by ignoring the ORDER BY is probably correct in most cases

alamb avatar Apr 02 '24 18:04 alamb

Thank you everyone who has contributed so far :pray:. I just filed a PR to update datafusion-federation to point upstream for plan->SQL going forward https://github.com/datafusion-contrib/datafusion-federation/pull/30. The additional expr implementations upstream here will enable datafusion-federation to handle many more queries than before.

devinjdangelo avatar Apr 06 '24 13:04 devinjdangelo

Here is a list that has not been done yet, for interested contributor to better view:

  • [ ] GetIndexedField
  • [ ] ScalarVariable
  • [ ] TryCast
  • [x] Sort
  • [x] Exists
  • [ ] Wildcard
  • [ ] GroupingSet
  • [ ] Placeholder
  • [ ] OuterReferenceColumn
  • [ ] Unnest
  • [ ] WindowFunction

Take ScalarVariable if this is still needed.

devanbenz avatar Apr 15 '24 15:04 devanbenz

Thank you very much @devanbenz -- that would be awesome. Yes in general completing this list (or determining it is not possible) is very much needed

alamb avatar Apr 16 '24 15:04 alamb

Sounds good. I have never contributed to this project before but am interested in helping out where I can and offering my support for this project -- @alamb ~~if I needed to ask any questions should that be done in the issue thread or is there a slack irc or discord where datafusion discussion takes place?~~ disregard I see this in the README: https://arrow.apache.org/datafusion/contributor-guide/communication.html :)

devanbenz avatar Apr 16 '24 16:04 devanbenz

Hi @devanbenz , I think ScalarVariable has not been done yet and you could have a try, though it may not be possible.🤔 I think maybe it should be converted to ScalarValue first, then call scalar_to_sql function.

yyy1000 avatar Apr 16 '24 16:04 yyy1000

Here are some thoughts to help people figure out what SQL matches to what expression:

  • GetIndexedField --> `SELECT col['field_name']
  • ScalarVariable --> Not sure (maybe @@foo@@?
  • TryCast * --> CAST (COL AS <datatype>>
  • Sort --> col ASC or col DESC NULLS FIRST
  • Exists --> col exists (SELECT 1 from foo)
  • Wildcard --> COUNT(*)
  • GroupingSet --> Not sure
  • Placeholder --> SELECT $1
  • OuterReferenceColumn --> not sure (I think this is the same as a column)
  • Unnest --> UNNEST(SELECT col FROM bar)
  • WindowFunction --> first_value(x ORDER BY y PARTITION BY z)

alamb avatar Apr 16 '24 21:04 alamb

Hi @devanbenz , I think ScalarVariable has not been done yet and you could have a try, though it may not be possible.🤔 I think maybe it should be converted to ScalarValue first, then call scalar_to_sql function.

Taking a look at this today. Will give it a try and update this thread with any thoughts, observations, or problems I run in to :)

devanbenz avatar Apr 23 '24 13:04 devanbenz

Here are some thoughts to help people figure out what SQL matches to what expression:

  • GetIndexedField --> `SELECT col['field_name']
  • ScalarVariable --> Not sure (maybe @@foo@@?
  • TryCast * --> CAST (COL AS <datatype>>
  • Sort --> col ASC or col DESC NULLS FIRST
  • Exists --> col exists (SELECT 1 from foo)
  • Wildcard --> COUNT(*)
  • GroupingSet --> Not sure
  • Placeholder --> SELECT $1
  • OuterReferenceColumn --> not sure (I think this is the same as a column)
  • Unnest --> UNNEST(SELECT col FROM bar)
  • WindowFunction --> first_value(x ORDER BY y PARTITION BY z)

@alamb I beleive ScalarVariable would just be something like @variable.

devanbenz avatar Apr 23 '24 13:04 devanbenz

Thanks @devanbenz !

alamb avatar Apr 23 '24 21:04 alamb

@alamb if you could please take a look at https://github.com/sqlparser-rs/sqlparser-rs/pull/1260 -- it's linked to this issue :)

devanbenz avatar May 06 '24 13:05 devanbenz

@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)

Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )

alamb avatar May 06 '24 16:05 alamb

@alamb if you could please take a look at sqlparser-rs/sqlparser-rs#1260 -- it's linked to this issue :)

Thanks @devanbenz -- can you explain how this is related to this issue? (Sorry I am sure it is obvious but I can't keep everything in my head anymore )

All good - this is for the Unparser expr_to_sql match statement https://github.com/apache/datafusion/blob/89443bff20015ef6c7646ca32754ece8c4093910/datafusion/sql/src/unparser/expr.rs#L391 I need to have a supported ast::Expr type for ScalarVariable from sql-parser. 👍

devanbenz avatar May 06 '24 20:05 devanbenz

It seems to me that Expr::ScalarVariable currently is parsed from ast::Identifier (that starts with @)-- perhaps instead of adding a new ast node, you could just create an equivalent ast::Identifier 🤔

https://github.com/apache/datafusion/blob/accce9732e26723cab2ffc521edbf5a3fe7460b3/datafusion/sql/src/expr/identifier.rs#L28-L43

alamb avatar May 07 '24 19:05 alamb

It seems to me that Expr::ScalarVariable currently is parsed from ast::Identifier (that starts with @)-- perhaps instead of adding a new ast node, you could just create an equivalent ast::Identifier 🤔

https://github.com/apache/datafusion/blob/accce9732e26723cab2ffc521edbf5a3fe7460b3/datafusion/sql/src/expr/identifier.rs#L28-L43

Sounds good - I'll have some time to come back to this in the next couple days! Will modify per your suggestion :)

devanbenz avatar May 07 '24 19:05 devanbenz

I filed tickets for the remaining issues

These ones are likely straightforward.

  • [ ] https://github.com/apache/datafusion/issues/10518
  • [ ] https://github.com/apache/datafusion/issues/10519
  • [ ] #10522

The others are likely more involved as I don't know how to make a reproducer

alamb avatar May 15 '24 10:05 alamb

With the completion of https://github.com/apache/datafusion/pull/10555 from @xinlifoobar I think this epic is now done!

alamb avatar May 20 '24 13:05 alamb