grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

[postgresql] Ambiguity with ROLLUP.

Open kaby76 opened this issue 11 months ago • 1 comments

We're now getting into the more difficult ambiguities with the grammar. And, we're making great progress in the speed of the parse: It now takes ~9s to parse all the input, compared to ~42s (note, both tests are without parsing function bodies). Before disabling function body parsing, parsing the entire test suite was ~45s.

Consider the input SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;. This can be parsed two ways.

11/10-19:25:02 ~/issues/g4-more-postgresql/sql/postgresql/Generated-CSharp
$ trparse -i 'SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;' --ambig | trtree -a
CSharp 0 string success 0.1997469
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (rollup_clause (ROLLUP "rollup") (OPEN_PAREN "(") (expr_list (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (unreserved_keyword (ROLLUP "rollup")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c")))))))))))))))))))))))))))) (CLOSE_PAREN ")"))))))))))))))))))))))))))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))

The problem here is described in the comments of gram.y:

To support CUBE and ROLLUP in GROUP BY without reserving them, we give them an explicit priority lower than '(', so that a rule with CUBE '(' will shift rather than reducing a conflicting rule that takes CUBE as a function name. Using the same precedence as IDENT seems right for the reasons given above.

kaby76 avatar Nov 11 '24 00:11 kaby76