presto
presto copied to clipboard
Fix inserting values for single column of RowType
Description
Currently, when we try to execute insert values
with single column of RowType, like row(a int, b varchar)
or row(r row(a int, b varchar))
, it would fail because of incorrect unfolding for the row. This PR fix the problem.
Test Plan
- Make sure the fix do not affect other test cases
- Newly added test case in AbstractTestDistributedQueries.testInsert()
Contributor checklist
- [x] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
- [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [x] If release notes are required, they follow the release notes guidelines.
- [x] Adequate tests were added if applicable.
- [x] CI passed.
Release Notes
== NO RELEASE NOTE ==
Lgtm, although introducing a new field in the scope isn't ideal. However, I couldn't identify a better alternative to address this issue
Without this change, I see the below behavior -
presto:simple> describe concrete;
Column | Type | Extra | Comment
--------+-------------------------------+-------+---------
x | row("a" integer, "b" varchar) | |
(1 row)
--Works
insert into concrete values ROW(ROW(1,'a'));
--Fails
insert into concrete values (ROW(1,'a'));
insert into concrete values ((1,'a'));
insert into concrete values (1,'a');
The difference between the success and failure cases is that the parsed com.facebook.presto.sql.tree.Query
has the com.facebook.presto.sql.tree.Values
mapped to a ROW(1,'a')
instead of a ROW(ROW(1,'a'))
I'm not sure if this is correct behavior for all of these failed cases, e.g when using (ROW(1,'a'))
, IMO, we are explicitly calling out that the input is a row of a row
The entry point for the parser is here: https://github.com/prestodb/presto/blob/a89f3e16227338216915405c63fe0e6a697d664c/presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/BuiltInQueryPreparer.java#L68-L70
IMO, we should investigate if the Analyzer is correct w.r.t to SQL grammar rules in interpreting the types passed for ROW types, I'm not familiar with these rules
Without this change, I see the below behavior -
presto:simple> describe concrete; Column | Type | Extra | Comment --------+-------------------------------+-------+--------- x | row("a" integer, "b" varchar) | | (1 row) --Works insert into concrete values ROW(ROW(1,'a')); --Fails insert into concrete values (ROW(1,'a')); insert into concrete values ((1,'a')); insert into concrete values (1,'a');
The difference between the success and failure cases is that the parsed
com.facebook.presto.sql.tree.Query
has thecom.facebook.presto.sql.tree.Values
mapped to aROW(1,'a')
instead of aROW(ROW(1,'a'))
I'm not sure if this is correct behavior for all of these failed cases, e.g when using(ROW(1,'a'))
, IMO, we are explicitly calling out that the input is a row of a row The entry point for the parser is here:https://github.com/prestodb/presto/blob/a89f3e16227338216915405c63fe0e6a697d664c/presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/BuiltInQueryPreparer.java#L68-L70
IMO, we should investigate if the Analyzer is correct w.r.t to SQL grammar rules in interpreting the types passed for ROW types, I'm not familiar with these rules
Thanks for you example. We can just figure out the grammar in your example. The sql grammar for row value is defined here: https://jakewheat.github.io/sql-overview/sql-2011-foundation-grammar.html#row-value-special-case.
Following the grammar, when there is only one column, we can omit the bracket for a row. But when we explicitly use Row(...)
, we should not simply treat it as a bracket and omit it. So, as I understand, when we execute statement insert into concrete values Row(Row(1, 'a'))
, we are trying to insert a row with single column data whose type is row(row(int, varchar))
. It has the same semantics as insert into concrete values (Row(Row(1, 'a')))
, so it's a wrong behavior if it works.
On the contrary, the three failure statements should not fail, because they all trying to insert a row with single column data whose type is row(int, varchar)
Following the grammar, when there is only one column, we can omit the bracket for a row. But when we explicitly use Row(...), we should not simply treat it as a bracket and omit it.
I'm not sure what interpretation of the parenthesis by the parser is correct. I will tag in a few other reviewers to comment on this. Let's try to arrive at the correct parser implementation if we can
ANTLR references for how VALUES is resolved for different systems :
https://github.com/prestodb/presto/blob/8242a40e7132946cc2de1645bef5dfdaadcc6a71/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4#L239
https://github.com/antlr/grammars-v4/blob/master/sql/postgresql/PostgreSQLParser.g4#L3123
values_clause
: VALUES OPEN_PAREN expr_list CLOSE_PAREN (COMMA OPEN_PAREN expr_list CLOSE_PAREN)*
;
https://github.com/antlr/grammars-v4/blob/master/sql/snowflake/SnowflakeParser.g4#L79
values_list
: VALUES '(' value_item (COMMA value_item)* ')'
;
https://github.com/antlr/grammars-v4/blob/master/sql/tsql/TSqlParser.g4#L4788
table_value_constructor
: VALUES '(' exps+=expression_list_ ')' (',' '(' exps+=expression_list_ ')')*
;
Presto allows for skipping parenthesis while most other systems do not My gut feeling here is that if we can solve the parenthesis matching that generates a ROW expression vs. one that matches the VALUES list, we could possibly have a more elegant solution
Presto allows for skipping parenthesis while most other systems do not My gut feeling here is that if we can solve the parenthesis matching that generates a ROW expression vs. one that matches the VALUES list, we could possibly have a more elegant solution
Thanks a lot for your great job of listing the ANTLR references. And I agree with you, if we can explicitly figure out a ROW expression and a line of value list in the parser, this problem could be resolved more elegantly. That means some grammar constraint may be introduced, as postgreSQL or snowflake does. And thanks for inviting other reviewers, they may have better thoughts.
Hi @aaneja @tdcmeehan , it's been a long time. Could we decide how to resolve this problem? Changing the grammar constraint of presto could be a solution, but in my opinion not to do so may be better, as it does indeed follow the SQL grammar spec and a lot of use cases and test cases are working based on it.
If not changing the grammar is ok, then I have refactored Scope
to make it not so problem customized, and support a mechanism which enable transmit messages from top to bottom when analyzing a statement tree.
What's your opinion? Any thoughts please let me know, thanks!