sqllineage
sqllineage copied to clipboard
The outermost select statement contains Chinese and cannot be parsed normally
SELECT t1.日期 FROM t1 left join t2 on t1.id=t2.id; --Failed SELECT cast(t1.日期 as string) as 日期 FROM t1 left join t2 on t1.id=t2.id; --SUCCESS
We don't officially support this feature and I'm surprised database/data warehouse support using Chinese as table/column name. Is it Aliyun you're using?
Even for the second SQL, sqllineage only parses table level lineage correctly, the column level result is still wrong. And if we look at the AST, it will be clear that '日期' is parsed as Error.
I don't think I'll come to work on this feature in the near future. This involves change on the underlying parser, which is something I want to avoid at this stage.
In [1]: sqlparse.parse("SELECT t1.日期 FROM t1 left join t2 on t1.id=t2.id")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 't1.'
| |- 0 Name 't1'
| `- 1 Punctuation '.'
|- 3 Error '日'
|- 4 Error '期'
|- 5 Whitespace ' '
|- 6 Keyword 'FROM'
|- 7 Whitespace ' '
|- 8 Identifier 't1'
| `- 0 Name 't1'
|- 9 Whitespace ' '
|- 10 Keyword 'left j...'
|- 11 Whitespace ' '
|- 12 Identifier 't2'
| `- 0 Name 't2'
|- 13 Whitespace ' '
|- 14 Keyword 'on'
|- 15 Whitespace ' '
`- 16 Comparison 't1.id=...'
|- 0 Identifier 't1.id'
| |- 0 Name 't1'
| |- 1 Punctuation '.'
| `- 2 Name 'id'
|- 1 Comparison '='
`- 2 Identifier 't2.id'
|- 0 Name 't2'
|- 1 Punctuation '.'
`- 2 Name 'id'
In [2]: sqlparse.parse("SELECT cast(t1.日期 as string) as 日期 FROM t1 left join t2 on t1.id=t2.id; ")[0]._pprint_tre
...: e()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 'cast(t...'
| |- 0 Function 'cast(t...'
| | |- 0 Identifier 'cast'
| | | `- 0 Name 'cast'
| | `- 1 Parenthesis '(t1.日期...'
| | |- 0 Punctuation '('
| | |- 1 Identifier 't1.'
| | | |- 0 Name 't1'
| | | `- 1 Punctuation '.'
| | |- 2 Error '日'
| | |- 3 Identifier '期 as s...'
| | | |- 0 Error '期'
| | | |- 1 Whitespace ' '
| | | |- 2 Keyword 'as'
| | | |- 3 Whitespace ' '
| | | `- 4 Identifier 'string'
| | | `- 0 Name 'string'
| | `- 4 Punctuation ')'
| |- 1 Whitespace ' '
| |- 2 Keyword 'as'
| |- 3 Whitespace ' '
| `- 4 Error '日'
|- 3 Error '期'
|- 4 Whitespace ' '
|- 5 Keyword 'FROM'
|- 6 Whitespace ' '
|- 7 Identifier 't1'
| `- 0 Name 't1'
|- 8 Whitespace ' '
|- 9 Keyword 'left j...'
|- 10 Whitespace ' '
|- 11 Identifier 't2'
| `- 0 Name 't2'
|- 12 Whitespace ' '
|- 13 Keyword 'on'
|- 14 Whitespace ' '
|- 15 Comparison 't1.id=...'
| |- 0 Identifier 't1.id'
| | |- 0 Name 't1'
| | |- 1 Punctuation '.'
| | `- 2 Name 'id'
| |- 1 Comparison '='
| `- 2 Identifier 't2.id'
| |- 0 Name 't2'
| |- 1 Punctuation '.'
| `- 2 Name 'id'
|- 16 Punctuation ';'
`- 17 Whitespace ' '