JSqlParser icon indicating copy to clipboard operation
JSqlParser copied to clipboard

ParseException due to ARRAY_AGG() and SAFE_CAST()

Open Tianqi1006 opened this issue 3 years ago • 7 comments

Hi,

We recently found this ParseException due to the use of ARRAY_AGG() and SAFE_CAST() from the built-in functions provided by Google BigQuery. However, when I tried to validate the SQL syntax, I got this error:

ParseException: Cannot parse statement: Encountered unexpected token: "(" "(" at line 1, column 17.

Was expecting one of:

"&"
"::"
";"
"<<"
">>"
"ACTION"
"ACTIVE"
"ALGORITHM"
"ARCHIVE"
"ARRAY"
"AS"
"AT"
"BYTE"
"CASCADE"
"CASE"
"CAST"
"CHANGE"
"CHAR"
"CHARACTER"
"CHECKPOINT"
"COLLATE"
"COLUMN"
"COLUMNS"
"COMMENT"
"COMMIT"
"CONNECT"
"COSTS"
"CYCLE"
"DBA_RECYCLEBIN"
"DEFAULT"
"DESC"
"DESCRIBE"
"DISABLE"
"DISCONNECT"
"DIV"
"DO"
"DUMP"
"DUPLICATE"
"EMIT"
"ENABLE"
"END"
"EXCLUDE"
"EXTRACT"
"FALSE"
"FILTER"
"FIRST"
"FLUSH"
"FN"
"FOLLOWING"
"FORMAT"
"FULLTEXT"
"GROUP"
"HAVING"
"HISTORY"
"INDEX"
"INSERT"
"INTERVAL"
"INTO"
"ISNULL"
"JSON"
"KEY"
"LAST"
"LEADING"
"LINK"
"LOCAL"
"LOG"
"MATERIALIZED"
"NO"
"NOLOCK"
"NULLS"
"OF"
"OPEN"
"OVER"
"PARALLEL"
"PARTITION"
"PATH"
"PERCENT"
"PRECISION"
"PRIMARY"
"PRIOR"
"QUERY"
"QUIESCE"
"RANGE"
"READ"
"RECYCLEBIN"
"REGISTER"
"REPLACE"
"RESTRICTED"
"RESUME"
"ROW"
"ROWS"
"SCHEMA"
"SEPARATOR"
"SEQUENCE"
"SESSION"
"SHUTDOWN"
"SIBLINGS"
"SIGNED"
"SIZE"
"SKIP"
"START"
"SUSPEND"
"SWITCH"
"SYNONYM"
"SYSTEM"
"TABLE"
"TABLESPACE"
"TEMP"
"TEMPORARY"
"TIMEOUT"
"TO"
"TOP"
"TRUE"
"TRUNCATE"
"TRY_CAST"
"TYPE"
"UNQIESCE"
"UNSIGNED"
"USER"
"VALIDATE"
"VALUE"
"VALUES"
"VIEW"
"WINDOW"
"XML"
"ZONE"
"["
"^"
"|"
<EOF>
<K_DATETIMELITERAL>
<K_DATE_LITERAL>
<K_NEXTVAL>
<K_STRING_FUNCTION_NAME>
<S_CHAR_LITERAL>
<S_IDENTIFIER>
<S_QUOTED_IDENTIFIER>

The SQL statement looks like this: select array_agg(safe_cast(n as Float64)) as temp from UNNEST(SPLIT("0,0,0,0,0",",")) as n

We are wondering is there a way to allow ARRAY_AGG() and SAFE_CAST() without throwing exceptions?

Thank you very much in advance, Tianqi

Tianqi1006 avatar Aug 24 '22 17:08 Tianqi1006

Greetings,

The problem is about SAFE_CAST(x AS y), which is not a regular function but a special expression. We will need to extend the CAST() expression accordingly.

manticore-projects avatar Aug 24 '22 18:08 manticore-projects

Greetings,

Thanks a lot for quickly identifying the root cause of the problem. Just would like to follow up with you and see when approximately will this change be applied / released? Any details would be greatly appreciated.

Best regards, Tianqi

Tianqi1006 avatar Aug 25 '22 13:08 Tianqi1006

Hi @manticore-projects , not only SAFE_CAST, but also ARRAY_AGG can not be fully correctly parsed. failed cases:

  1. ARRAY_AGG(x LMIT 10)
  2. ARRAY_AGG( x IGNORE NULLS ORDER BY x)
  3. ARRAY_AGG( DISTINCT x) Google ARRAY_AGG references: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg JsqlParser Version: 4.5

dequn avatar Sep 19 '22 05:09 dequn

1. `ARRAY_AGG(x LMIT 10)`
2. `ARRAY_AGG( x IGNORE NULLS ORDER BY x)`
3. `ARRAY_AGG( DISTINCT x)`
   Google ARRAY_AGG references: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#array_agg

Certainly, because all of those are mot standard compliant, but Google BigQuery specific. You will need to sponsor or provide an implementation for such special function parameters.

manticore-projects avatar Sep 19 '22 06:09 manticore-projects

@manticore-projects Thanks, I add SAFE_CAST support and create a PR.

dequn avatar Sep 19 '22 06:09 dequn

Do I get this right, that this issue is not yet complete by the PR?

wumpz avatar Sep 20 '22 18:09 wumpz

@wumpz Yes, It only solves SAFE_CAST related problem, ARRAY_AGG is not included in this PR.

dequn avatar Sep 21 '22 07:09 dequn

Continued as Feature request under #1856

manticore-projects avatar Sep 02 '23 05:09 manticore-projects