build(deps): upgrade sqlparser to 0.46.0
Upgrade sqlparser to 0.46.0. This should handle a few of breaking changes:
- [x] Insert and Delete are now encapsulated in a struct.
- [ ] JsonAccess refactor - https://github.com/sqlparser-rs/sqlparser-rs/pull/1215/
- [x] SQLFunction refactor - https://github.com/sqlparser-rs/sqlparser-rs/pull/1247
- [x] Window Function refactor - https://github.com/sqlparser-rs/sqlparser-rs/pull/1237
This is working in progress.
@jmhain Thanks for your patch. I've applied it to this branch. Let us continue fixing the breaking changes here when we have time :D
Pushed a commit to handle other breaking changes. Now remains the JsonAccess and Agg func refactor. I'm still understanding how they are logically changed ...
Seems that we should merge AggregateExpressionWithFilter and ArrayAgg handle logic into sql_function_to_expr.
I am going to give this PR a look to see if I can help unstick it
There is still a failure which I haven't had a chance to look into:
cargo test --test sqllogictests -- array
...
Running "array_query.slt"
Running "array.slt"
External error: query failed: DataFusion error: SQL error: ParserError("Expected variant object key name, found: 2")
[SQL] select make_array(1, 2, 3)[1:2], make_array(1.0, 2.0, 3.0)[2:3], make_array('h', 'e', 'l', 'l', 'o')[2:4];
at test_files/array.slt:859
Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
Caused by:
process didn't exit successfully: `/Users/andrewlamb/Software/datafusion/target/debug/deps/sqllogictests-b3a0a4f302be541c array` (exit status: 1)
I'm testing on @jmhain's branch https://github.com/sqlparser-rs/sqlparser-rs/pull/1290 on the subscript fixes and encounter the following new API breaks:
Compiling datafusion-sql v38.0.0 (/Users/tison/Brittani/arrow-datafusion/datafusion/sql)
error[E0599]: no variant named `ArrayIndex` found for enum `sqlparser::ast::Expr`
--> datafusion/sql/src/expr/mod.rs:197:22
|
197 | SQLExpr::ArrayIndex { obj, indexes } => {
| ^^^^^^^^^^ variant not found in `sqlparser::ast::Expr`
error[E0599]: no variant or associated item named `RawStringLiteral` found for enum `sqlparser::ast::Value` in the current scope
--> datafusion/sql/src/statement.rs:78:18
|
78 | | Value::RawStringLiteral(_)
| ^^^^^^^^^^^^^^^^
| |
| variant or associated item not found in `Value`
| help: there is a variant with a similar name: `HexStringLiteral`
error[E0026]: variant `sqlparser::ast::Statement::SetVariable` does not have a field named `variable`
--> datafusion/sql/src/statement.rs:232:17
|
232 | variable,
| ^^^^^^^^
| |
| variant `sqlparser::ast::Statement::SetVariable` does not have this field
| help: a field with a similar name exists: `variables`
Compiling datafusion-functions-aggregate v38.0.0 (/Users/tison/Brittani/arrow-datafusion/datafusion/functions-aggregate)
error[E0004]: non-exhaustive patterns: `FunctionArgumentClause::Having(_)` and `FunctionArgumentClause::Separator(_)` not covered
--> datafusion/sql/src/expr/function.rs:142:19
|
142 | match clause {
| ^^^^^^ patterns `FunctionArgumentClause::Having(_)` and `FunctionArgumentClause::Separator(_)` not covered
|
note: `FunctionArgumentClause` defined here
--> /Users/tison/.cargo/git/checkouts/sqlparser-rs-7df4902af8c87ae3/bd573b2/src/ast/mod.rs:4995:1
|
4995 | pub enum FunctionArgumentClause {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
5023 | Having(HavingBound),
| ------ not covered
...
5027 | Separator(Value),
| --------- not covered
= note: the matched value is of type `FunctionArgumentClause`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern, a match arm with multiple or-patterns as shown, or multiple match arms
|
166 ~ },
167 + FunctionArgumentClause::Having(_) | FunctionArgumentClause::Separator(_) => todo!()
|
Also related to -
- [x] https://github.com/sqlparser-rs/sqlparser-rs/pull/1262
- [x] https://github.com/sqlparser-rs/sqlparser-rs/pull/1252
- [x] https://github.com/sqlparser-rs/sqlparser-rs/pull/1258
- [x] https://github.com/sqlparser-rs/sqlparser-rs/pull/1256
I've handled all other trivial new ASTs. But how to migrate SQLExpr::ArrayIndex to SQLExpr::Subscript still need some time to work out. Perhaps @jmhain can take a look and patching this.
Ensure all the sqllogictests passed without the array index cases:
Details
### Array index
## array[i]
# single index with scalars #1 (positive index)
query IRT
select make_array(1, 2, 3)[1], make_array(1.0, 2.0, 3.0)[2], make_array('h', 'e', 'l', 'l', 'o')[3];
----
1 2 l
# single index with scalars #2 (zero index)
query I
select make_array(1, 2, 3)[0];
----
NULL
# single index with scalars #3 (negative index)
query IRT
select make_array(1, 2, 3)[-1], make_array(1.0, 2.0, 3.0)[-2], make_array('h', 'e', 'l', 'l', 'o')[-3];
----
3 2 l
# single index with scalars #4 (complex index)
query IRT
select make_array(1, 2, 3)[1 + 2 - 1], make_array(1.0, 2.0, 3.0)[2 * 1 * 0 - 2], make_array('h', 'e', 'l', 'l', 'o')[2 - 3];
----
2 2 o
# single index with columns #1 (positive index)
query ?RT
select column1[2], column2[3], column3[1] from arrays;
----
[3, ] 3.3 L
[5, 6] 6.6 i
[7, 8] 9.9 d
[9, 10] 12.2 s
NULL 15.5 a
[13, 14] NULL ,
[, 18] 18.8 NULL
# single index with columns #2 (zero index)
query ?RT
select column1[0], column2[0], column3[0] from arrays;
----
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
# single index with columns #3 (negative index)
query ?RT
select column1[-2], column2[-3], column3[-1] from arrays;
----
[, 2] 1.1 m
[3, 4] NULL m
[5, 6] 7.7 r
[7, ] 10.1 t
NULL 13.3 t
[11, 12] NULL ,
[15, 16] 16.6 NULL
# single index with columns #4 (complex index)
query ?RT
select column1[9 - 7], column2[2 * 0], column3[1 - 3] from arrays;
----
[3, ] NULL e
[5, 6] NULL u
[7, 8] NULL o
[9, 10] NULL i
NULL NULL e
[13, 14] NULL NULL
[, 18] NULL NULL
# TODO: support index as column
# single index with columns #5 (index as column)
# query ?
# select make_array(1, 2, 3, 4, 5)[column2] from arrays_with_repeating_elements;
# ----
# TODO: support argument and index as columns
# single index with columns #6 (argument and index as columns)
# query I
# select column1[column2] from arrays_with_repeating_elements;
# ----
## array[i:j]
# multiple index with columns #1 (positive index)
query ???
select make_array(1, 2, 3)[1:2], make_array(1.0, 2.0, 3.0)[2:3], make_array('h', 'e', 'l', 'l', 'o')[2:4];
----
[1, 2] [2.0, 3.0] [e, l, l]
query ???
select arrow_cast([1, 2, 3], 'LargeList(Int64)')[1:2],
arrow_cast([1.0, 2.0, 3.0], 'LargeList(Int64)')[2:3],
arrow_cast(['h', 'e', 'l', 'l', 'o'], 'LargeList(Utf8)')[2:4]
;
----
[1, 2] [2, 3] [e, l, l]
# multiple index with columns #2 (zero index)
query ???
select make_array(1, 2, 3)[0:0], make_array(1.0, 2.0, 3.0)[0:2], make_array('h', 'e', 'l', 'l', 'o')[0:6];
----
[] [1.0, 2.0] [h, e, l, l, o]
query ???
select arrow_cast([1, 2, 3], 'LargeList(Int64)')[0:0],
arrow_cast([1.0, 2.0, 3.0], 'LargeList(Int64)')[0:2],
arrow_cast(['h', 'e', 'l', 'l', 'o'], 'LargeList(Utf8)')[0:6]
;
----
[] [1, 2] [h, e, l, l, o]
query I
select arrow_cast([1, 2, 3], 'LargeList(Int64)')[1];
----
1
# TODO: support multiple negative index
# multiple index with columns #3 (negative index)
# query II
# select make_array(1, 2, 3)[-3:-1], make_array(1.0, 2.0, 3.0)[-3:-1], make_array('h', 'e', 'l', 'l', 'o')[-2:0];
# ----
# TODO: support complex index
# multiple index with columns #4 (complex index)
# query III
# select make_array(1, 2, 3)[2 + 1 - 1:10], make_array(1.0, 2.0, 3.0)[2 | 2:10], make_array('h', 'e', 'l', 'l', 'o')[6 ^ 6:10];
# ----
# multiple index with columns #1 (positive index)
query ???
select column1[2:4], column2[1:4], column3[3:4] from arrays;
----
[[3, ]] [1.1, 2.2, 3.3] [r, e]
[[5, 6]] [, 5.5, 6.6] [, u]
[[7, 8]] [7.7, 8.8, 9.9] [l, o]
[[9, 10]] [10.1, , 12.2] [t]
[] [13.3, 14.4, 15.5] [e, t]
[[13, 14]] [] []
[[, 18]] [16.6, 17.7, 18.8] []
# multiple index with columns #2 (zero index)
query ???
select column1[0:5], column2[0:3], column3[0:9] from arrays;
----
[[, 2], [3, ]] [1.1, 2.2, 3.3] [L, o, r, e, m]
[[3, 4], [5, 6]] [, 5.5, 6.6] [i, p, , u, m]
[[5, 6], [7, 8]] [7.7, 8.8, 9.9] [d, , l, o, r]
[[7, ], [9, 10]] [10.1, , 12.2] [s, i, t]
[] [13.3, 14.4, 15.5] [a, m, e, t]
[[11, 12], [13, 14]] [] [,]
[[15, 16], [, 18]] [16.6, 17.7, 18.8] []
# TODO: support negative index
# multiple index with columns #3 (negative index)
# query ?RT
# select column1[-2:-4], column2[-3:-5], column3[-1:-4] from arrays;
# ----
# [, 2] 1.1 m
# TODO: support complex index
# multiple index with columns #4 (complex index)
# query ?RT
# select column1[9 - 7:2 + 2], column2[1 * 0:2 * 3], column3[1 + 1 - 0:5 % 3] from arrays;
# ----
# TODO: support first index as column
# multiple index with columns #5 (first index as column)
# query ?
# select make_array(1, 2, 3, 4, 5)[column2:4] from arrays_with_repeating_elements
# ----
# TODO: support last index as column
# multiple index with columns #6 (last index as column)
# query ?RT
# select make_array(1, 2, 3, 4, 5)[2:column3] from arrays_with_repeating_elements;
# ----
# TODO: support argument and indices as column
# multiple index with columns #7 (argument and indices as column)
# query ?RT
# select column1[column2:column3] from arrays_with_repeating_elements;
# ----
# array[i:j:k]
# multiple index with columns #1 (positive index)
query ???
select make_array(1, 2, 3)[1:2:2], make_array(1.0, 2.0, 3.0)[2:3:2], make_array('h', 'e', 'l', 'l', 'o')[2:4:2];
----
[1] [2.0] [e, l]
# multiple index with columns #2 (zero index)
query ???
select make_array(1, 2, 3)[0:0:2], make_array(1.0, 2.0, 3.0)[0:2:2], make_array('h', 'e', 'l', 'l', 'o')[0:6:2];
----
[] [1.0] [h, l, o]
#TODO: sqlparser does not support negative index
## multiple index with columns #3 (negative index)
#query ???
#select make_array(1, 2, 3)[-1:-2:-2], make_array(1.0, 2.0, 3.0)[-2:-3:-2], make_array('h', 'e', 'l', 'l', 'o')[-2:-4:-2];
#----
#[1] [2.0] [e, l]
# multiple index with columns #1 (positive index)
query ???
select column1[2:4:2], column2[1:4:2], column3[3:4:2] from arrays;
----
[[3, ]] [1.1, 3.3] [r]
[[5, 6]] [, 6.6] []
[[7, 8]] [7.7, 9.9] [l]
[[9, 10]] [10.1, 12.2] [t]
[] [13.3, 15.5] [e]
[[13, 14]] [] []
[[, 18]] [16.6, 18.8] []
# multiple index with columns #2 (zero index)
query ???
select column1[0:5:2], column2[0:3:2], column3[0:9:2] from arrays;
----
[[, 2]] [1.1, 3.3] [L, r, m]
[[3, 4]] [, 6.6] [i, , m]
[[5, 6]] [7.7, 9.9] [d, l, r]
[[7, ]] [10.1, 12.2] [s, t]
[] [13.3, 15.5] [a, e]
[[11, 12]] [] [,]
[[15, 16]] [16.6, 18.8] []
Thanks @tisonkun -- I plan to review this and other sqlparser stuff tomorrow
I have been playing around with this PR locally. I think we need to make some additional changes to how SQL planning works for these structured field accesses, but I think it will be simpler when we are done. I'll push some changes shortly
I also see what else is needed upstream in sqlparser and I'll try and add it to @jmhain 's branch too
I updated went and updated @jmhain 's PR https://github.com/sqlparser-rs/sqlparser-rs/pull/1290 to support the stride argument https://github.com/sqlparser-rs/sqlparser-rs/pull/1290#issuecomment-2139994033
I then pushed a commit to this branch to rework how field access is planned. Let's see how the CI does
🎉 the tests are now passing
@alamb Thank you and cool! I'll update this PR once sqlparser released a 0.47.
I tried to update this PR to https://github.com/sqlparser-rs/sqlparser-rs/commit/afa5f08db9b1f3a4805f21fea6b1e72710cdb138 (the latest and greatest commit from sqlparser-rs) but the recent rework of the function definition seem to cause issues. I need to look at that again in more detail
@alamb there is even a new breaking change at https://github.com/sqlparser-rs/sqlparser-rs/pull/1272:
error[E0277]: the trait bound `Arc<str>: From<Box<sqlparser::ast::Expr>>` is not satisfied
--> datafusion/sql/src/expr/mod.rs:578:74
|
578 | DataType::Timestamp(TimeUnit::Nanosecond, Some(time_zone.into())),
| ^^^^ the trait `From<Box<sqlparser::ast::Expr>>` is not implemented for `Arc<str>`, which is required by `Box<sqlparser::ast::Expr>: Into<_>`
|
= help: the following other types implement trait `From<T>`:
<Arc<str> as From<&str>>
<Arc<str> as From<std::string::String>>
= note: required for `Box<sqlparser::ast::Expr>` to implement `Into<Arc<str>>`
I'm worried about downstream projects to upgrade datafusion now 🤣
Pushed some commit to catch up new sqlparser-rs commits. Let's see.
Since we're on the sqlparser-rs main branch now, I'd consider this patch as ready to merge.
But of course it's better to have a new release to sqlparser-rs and we move to that version. Hopefully it contains what it has now without new breaking changes :P
Window specific test failed. I don't have a windows env to debug ...
test cli_quick_test::case_4_set_batch_size ... FAILED
failures:
---- cli_quick_test::case_4_set_batch_size stdout ----
thread 'cli_quick_test::case_4_set_batch_size' panicked at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6\library\core\src\ops\function.rs:250:5:
Unexpected stdout, failed var == "[{\"name\":\"datafusion.execution.batch_size\",\"value\":\"1\"}]\n"
├── var: ""
└── var as str:
command=`"D:\\a\\datafusion\\datafusion\\datafusion-cli\\target\\debug\\datafusion-cli.exe" "--command" "show datafusion.execution.batch_size" "--format" "json" "-q" "-b" "1"`
code=-1073741571
stdout=""
stderr=```
thread \'main\' has overflowed its stack
Since we're on the sqlparser-rs main branch now, I'd consider this patch as ready to merge.
But of course it's better to have a new release to sqlparser-rs and we move to that version. Hopefully it contains what it has now without new breaking changes :P
Thank you so much @tisonkun -- I agree
I will go make a sqlparser release now without any new changes
Thanks @tisonkun -- I updated this PR to use the newly released 0.47.0 version of sqlparser-rs and updated the description. I suppose now we just need to debug the windows test (I have an idea) and we'll be good to go
My plan for this PR is to disable the failing test on windows and file a ticket to investigate
@alamb Thanks for your review and follow-up! @jmhain Thanks for your help :D
I did a final merge up from main to make sure we don't have any logical conflicts and then I plan to merge this PR