sqllineage icon indicating copy to clipboard operation
sqllineage copied to clipboard

Error parsing tsql without semicolons

Open Guojcc opened this issue 1 year ago • 2 comments

Describe the bug Error parsing tsql containing 'group by' statement without semicolon. add TSQL_NO_SEMICOLON config.

SQL

select id from foo group by id 
select * from bar
  • if CLI (Command Line Interface): provide the command you're calling and the output.
sqllineage -e "select id from foo group by id select * from bar" -d "tsql"
Statements(#): 1
Source Tables:
    <default>.foo
Target Tables:

Python version (available via python --version)

  • 3.12.0

SQLLineage version (available via sqllineage --version):

  • 1.5.0

Additional context This type of tsql without semicolons can be parsed

select * from foo
select * from bar

image

Guojcc avatar Feb 01 '24 06:02 Guojcc

This is a parser issue with sqlfluff that we can verify with:

sqlfluff parse test.sql --dialect=tsql

There's only one statement parsed when GROUP BY clause is added:

select id from foo group by id 
select * from bar
[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        column_reference:
[L:  1, P:  8]      |                            naked_identifier:                 'id'
[L:  1, P: 10]      |                    [META] dedent:
[L:  1, P: 10]      |                whitespace:                                   ' '
[L:  1, P: 11]      |                from_clause:
[L:  1, P: 11]      |                    keyword:                                  'from'
[L:  1, P: 15]      |                    whitespace:                               ' '
[L:  1, P: 16]      |                    from_expression:
[L:  1, P: 16]      |                        [META] indent:
[L:  1, P: 16]      |                        from_expression_element:
[L:  1, P: 16]      |                            table_expression:
[L:  1, P: 16]      |                                table_reference:
[L:  1, P: 16]      |                                    naked_identifier:         'foo'
[L:  1, P: 19]      |                        [META] dedent:
[L:  1, P: 19]      |                whitespace:                                   ' '
[L:  1, P: 20]      |                groupby_clause:
[L:  1, P: 20]      |                    keyword:                                  'group'
[L:  1, P: 25]      |                    whitespace:                               ' '
[L:  1, P: 26]      |                    keyword:                                  'by'
[L:  1, P: 28]      |                    [META] indent:
[L:  1, P: 28]      |                    whitespace:                               ' '
[L:  1, P: 29]      |                    column_reference:
[L:  1, P: 29]      |                        naked_identifier:                     'id'
[L:  1, P: 31]      |                    whitespace:                               ' '
[L:  1, P: 32]      |                    newline:                                  '\n'
[L:  2, P:  1]      |                    expression:
[L:  2, P:  1]      |                        select_statement:
[L:  2, P:  1]      |                            select_clause:
[L:  2, P:  1]      |                                keyword:                      'select'
[L:  2, P:  7]      |                                [META] indent:
[L:  2, P:  7]      |                                whitespace:                   ' '
[L:  2, P:  8]      |                                select_clause_element:
[L:  2, P:  8]      |                                    wildcard_expression:
[L:  2, P:  8]      |                                        wildcard_identifier:
[L:  2, P:  8]      |                                            star:             '*'
[L:  2, P:  9]      |                                [META] dedent:
[L:  2, P:  9]      |                            whitespace:                       ' '
[L:  2, P: 10]      |                            from_clause:
[L:  2, P: 10]      |                                keyword:                      'from'
[L:  2, P: 14]      |                                whitespace:                   ' '
[L:  2, P: 15]      |                                from_expression:
[L:  2, P: 15]      |                                    [META] indent:
[L:  2, P: 15]      |                                    from_expression_element:
[L:  2, P: 15]      |                                        table_expression:
[L:  2, P: 15]      |                                            table_reference:
[L:  2, P: 15]      |                                                naked_identifier:  'bar'
[L:  2, P: 18]      |                                    [META] dedent:
[L:  2, P: 18]      |                    [META] dedent:
[L:  2, P: 18]      |    newline:                                                  '\n'
[L:  3, P:  1]      |    [META] end_of_file:

Whereas removing GROUP BY results in the correct AST with two statements:

select id from foo
select * from bar
[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    batch:
[L:  1, P:  1]      |        statement:
[L:  1, P:  1]      |            select_statement:
[L:  1, P:  1]      |                select_clause:
[L:  1, P:  1]      |                    keyword:                                  'select'
[L:  1, P:  7]      |                    [META] indent:
[L:  1, P:  7]      |                    whitespace:                               ' '
[L:  1, P:  8]      |                    select_clause_element:
[L:  1, P:  8]      |                        column_reference:
[L:  1, P:  8]      |                            naked_identifier:                 'id'
[L:  1, P: 10]      |                    [META] dedent:
[L:  1, P: 10]      |                whitespace:                                   ' '
[L:  1, P: 11]      |                from_clause:
[L:  1, P: 11]      |                    keyword:                                  'from'
[L:  1, P: 15]      |                    whitespace:                               ' '
[L:  1, P: 16]      |                    from_expression:
[L:  1, P: 16]      |                        [META] indent:
[L:  1, P: 16]      |                        from_expression_element:
[L:  1, P: 16]      |                            table_expression:
[L:  1, P: 16]      |                                table_reference:
[L:  1, P: 16]      |                                    naked_identifier:         'foo'
[L:  1, P: 19]      |                        [META] dedent:
[L:  1, P: 19]      |        whitespace:                                           ' '
[L:  1, P: 20]      |        newline:                                              '\n'
[L:  2, P:  1]      |        statement:
[L:  2, P:  1]      |            select_statement:
[L:  2, P:  1]      |                select_clause:
[L:  2, P:  1]      |                    keyword:                                  'select'
[L:  2, P:  7]      |                    [META] indent:
[L:  2, P:  7]      |                    whitespace:                               ' '
[L:  2, P:  8]      |                    select_clause_element:
[L:  2, P:  8]      |                        wildcard_expression:
[L:  2, P:  8]      |                            wildcard_identifier:
[L:  2, P:  8]      |                                star:                         '*'
[L:  2, P:  9]      |                    [META] dedent:
[L:  2, P:  9]      |                whitespace:                                   ' '
[L:  2, P: 10]      |                from_clause:
[L:  2, P: 10]      |                    keyword:                                  'from'
[L:  2, P: 14]      |                    whitespace:                               ' '
[L:  2, P: 15]      |                    from_expression:
[L:  2, P: 15]      |                        [META] indent:
[L:  2, P: 15]      |                        from_expression_element:
[L:  2, P: 15]      |                            table_expression:
[L:  2, P: 15]      |                                table_reference:
[L:  2, P: 15]      |                                    naked_identifier:         'bar'
[L:  2, P: 18]      |                        [META] dedent:
[L:  2, P: 18]      |    newline:                                                  '\n'
[L:  3, P:  1]      |    [META] end_of_file:

We need to fix it on sqlfluff side.

reata avatar Feb 04 '24 14:02 reata

I hava already submitted a bug to sqlfluff, Please help to supplement other content

[#5601]

Guojcc avatar Feb 05 '24 02:02 Guojcc