sqllineage icon indicating copy to clipboard operation
sqllineage copied to clipboard

The outermost select statement contains Chinese and cannot be parsed normally

Open Dkbei opened this issue 2 years ago • 1 comments

SELECT t1.日期 FROM t1 left join t2 on t1.id=t2.id; --Failed SELECT cast(t1.日期 as string) as 日期 FROM t1 left join t2 on t1.id=t2.id; --SUCCESS

Dkbei avatar May 27 '22 03:05 Dkbei

We don't officially support this feature and I'm surprised database/data warehouse support using Chinese as table/column name. Is it Aliyun you're using?

Even for the second SQL, sqllineage only parses table level lineage correctly, the column level result is still wrong. And if we look at the AST, it will be clear that '日期' is parsed as Error.

I don't think I'll come to work on this feature in the near future. This involves change on the underlying parser, which is something I want to avoid at this stage.

In [1]: sqlparse.parse("SELECT t1.日期 FROM t1 left join t2 on t1.id=t2.id")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 't1.'
|  |- 0 Name 't1'
|  `- 1 Punctuation '.'
|- 3 Error '日'
|- 4 Error '期'
|- 5 Whitespace ' '
|- 6 Keyword 'FROM'
|- 7 Whitespace ' '
|- 8 Identifier 't1'
|  `- 0 Name 't1'
|- 9 Whitespace ' '
|- 10 Keyword 'left j...'
|- 11 Whitespace ' '
|- 12 Identifier 't2'
|  `- 0 Name 't2'
|- 13 Whitespace ' '
|- 14 Keyword 'on'
|- 15 Whitespace ' '
`- 16 Comparison 't1.id=...'
   |- 0 Identifier 't1.id'
   |  |- 0 Name 't1'
   |  |- 1 Punctuation '.'
   |  `- 2 Name 'id'
   |- 1 Comparison '='
   `- 2 Identifier 't2.id'
      |- 0 Name 't2'
      |- 1 Punctuation '.'
      `- 2 Name 'id'

In [2]: sqlparse.parse("SELECT cast(t1.日期 as string) as 日期 FROM t1 left join t2 on t1.id=t2.id; ")[0]._pprint_tre
    ...: e()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 'cast(t...'
|  |- 0 Function 'cast(t...'
|  |  |- 0 Identifier 'cast'
|  |  |  `- 0 Name 'cast'
|  |  `- 1 Parenthesis '(t1.日期...'
|  |     |- 0 Punctuation '('
|  |     |- 1 Identifier 't1.'
|  |     |  |- 0 Name 't1'
|  |     |  `- 1 Punctuation '.'
|  |     |- 2 Error '日'
|  |     |- 3 Identifier '期 as s...'
|  |     |  |- 0 Error '期'
|  |     |  |- 1 Whitespace ' '
|  |     |  |- 2 Keyword 'as'
|  |     |  |- 3 Whitespace ' '
|  |     |  `- 4 Identifier 'string'
|  |     |     `- 0 Name 'string'
|  |     `- 4 Punctuation ')'
|  |- 1 Whitespace ' '
|  |- 2 Keyword 'as'
|  |- 3 Whitespace ' '
|  `- 4 Error '日'
|- 3 Error '期'
|- 4 Whitespace ' '
|- 5 Keyword 'FROM'
|- 6 Whitespace ' '
|- 7 Identifier 't1'
|  `- 0 Name 't1'
|- 8 Whitespace ' '
|- 9 Keyword 'left j...'
|- 10 Whitespace ' '
|- 11 Identifier 't2'
|  `- 0 Name 't2'
|- 12 Whitespace ' '
|- 13 Keyword 'on'
|- 14 Whitespace ' '
|- 15 Comparison 't1.id=...'
|  |- 0 Identifier 't1.id'
|  |  |- 0 Name 't1'
|  |  |- 1 Punctuation '.'
|  |  `- 2 Name 'id'
|  |- 1 Comparison '='
|  `- 2 Identifier 't2.id'
|     |- 0 Name 't2'
|     |- 1 Punctuation '.'
|     `- 2 Name 'id'
|- 16 Punctuation ';'
`- 17 Whitespace ' '

reata avatar Jun 06 '22 11:06 reata