sql-formatter icon indicating copy to clipboard operation
sql-formatter copied to clipboard

Support Chinese parenthesis characters in MySQL

Open jiangyayu opened this issue 1 year ago • 9 comments

Input data

SELECT `时间` as "时间",SUM(进度(计划完成率)) as "SUM(`进度(计划完成率)`)" FROM ds_upload_19 WHERE 1=1 GROUP BY `时间` LIMIT 1000 

Expected Output

SELECT
  `时间` as "时间",
  SUM(进 度 ( 计 划 完 成 率 )) as "SUM(`进度(计划完成率)`)"
FROM
  ds_upload_19
WHERE
  1 = 1
GROUP BY
  `时间`
LIMIT
  1000

Actual Output

SELECT `时间` as "时间",SUM(进度(计划完成率)) as "SUM(`进度(计划完成率)`)" FROM ds_upload_19 WHERE 1=1 GROUP BY `时间` LIMIT 1000 

Usage

  • How are you calling / using the library?
  • What SQL language(s) does this apply to?
  • Which SQL Formatter version are you using?

jiangyayu avatar Jun 13 '23 12:06 jiangyayu

Are you able to provide more context?

grantwforsythe avatar Jun 13 '23 18:06 grantwforsythe

The formatter works as expected. -- starts a line comment in SQL.

This is also demonstrated in how Github syntax-highlights this code (grayed out as a comment).

nene avatar Jun 14 '23 09:06 nene

sorry, it's my fault. The correct SQL is SELECT 时间 as "时间",SUM(进度(计划完成率)) as "SUM(进度(计划完成率))" FROM ds_upload_19 WHERE 1=1 GROUP BY 时间 LIMIT 1000, without --

jiangyayu avatar Jun 14 '23 10:06 jiangyayu

The format result is correct when I use "( " instead of "(", the problem may lies here.

jiangyayu avatar Jun 14 '23 11:06 jiangyayu

Are you able to provide more context?

There is no more context.

jiangyayu avatar Jun 14 '23 11:06 jiangyayu

So, I understand the issue is in some sort of Unicode parenthesis character. I don't know what's the role of this character in this language and how it should be treated in SQL, or how the SQL dialect you're using treats it.

To simplify diagnosing the problem, could you rewrite this problematic of SQL of yours with the minimum amount of non-ascii characters.

For context, you haven't mentioned which dialect of SQL are you using. Like MySQL, SQLite, etc?

nene avatar Jun 14 '23 11:06 nene

Simplified SQL: select str(str)from db

This is a type of MySQL without character restrictions and the role of character "(" in Chinese is equivalent to character "(" in English.

jiangyayu avatar Jun 14 '23 12:06 jiangyayu

Thanks for the explanation @jiangyayu.

I'll need to do some research into how this issue impacts (or doesn't impact) other dialects.

It definitely won't be a simple thing to fix.

A few additional questions, to make sure I get things right:

  • if the formatter would replace all these Chinese "(" characters with plain ASCII "(", it probably wouldn't be acceptable, right?
  • If one uses the Chinese open-paren character, is it mandatory to close it also with Chinese close-paren character, or can the Chinese/ASCII variants be used interchangably?

nene avatar Jun 14 '23 12:06 nene

For the first question, the answer is right. If the input is Chinese "(" characters and the output turn this characters into "(" which means changed the input, so I think it's not acceptable.

In Chinese, open-paren character and close-paren character should be used in pairs. The grammar is incorrect if only use one of them.

jiangyayu avatar Jun 15 '23 01:06 jiangyayu