sql-formatter icon indicating copy to clipboard operation
sql-formatter copied to clipboard

Error when formatting Spark SQL

Open liyuanhao6 opened this issue 1 year ago • 5 comments

First Bug

set livy.session.conf.spark.driver.maxResultSize=8g;
set livy.session.conf.spark.driver.memory=12g;
set livy.session.conf.spark.driver.memoryOverhead=8g;
set livy.session.conf.spark.kryoserializer.buffer.max=512m;

Unable to format SQL: Error: Parse error: Unexpected "8g; set li" at line 1 column 50

We usually use the Spark configurations in the SQL codes, and it reports the bug. We suppose those configurations are useful, and could you please help us fix this bug?

Second Bug

WHERE
    dayno = ${v_day}

Unable to format SQL: Error: Parse error: Unexpected "${v_day} )" at line 112 column 17

We usually use shell parameter in the SQL codes, and it reports the bug. We suppose those parameters are useful, and could you please help us fix this bug?

liyuanhao6 avatar Jun 12 '24 09:06 liyuanhao6

Thanks for reporting.

  • The second problem can be dealt with the help of paramTypes configuration. See the FAQ.

  • The first problem is a case of Spark SQL syntax which I haven't yet seen. I'll look into fixing this. Though during summer I have little time for open source.

nene avatar Jun 21 '24 17:06 nene

@liyuanhao6 if you are using VSCode try:

  "SQL-Formatter-VSCode.paramTypes": {
    "custom": [
      {
        "regex": "\\$?\\{[a-zA-Z0-9_]+\\}"
      }
    ]
  },

for the second problem.

amadeuspzs avatar Jul 03 '24 16:07 amadeuspzs

@amadeuspzs Thank you for the fix!

Interesting though, that I could format my sparksql code without this fix in the settings. However after formatting our special param syntax, I'm getting an unnecessary whitespace after the last } as example below:

--- code before formatting:

create or replace view ${unity_catalog_name}.${environment_name}_sublayer.tablename 

--- code after formatting:

create or replace view ${unity_catalog_name}.${environment_name} _sublayer.tablename -- whitespace appears after closing '}'

palinkasnorbert avatar Nov 06 '24 12:11 palinkasnorbert

Yeah, this ${...} syntax is actually supported by default.

However the formatter assumes that ${...} constitutes some kind of identifier. If it works as just a plain string substitution, then all bets are off. In that case I guess you could also have other SQL syntax as part of these substitutions, in which case the formatter would have no idea how it should format it:

CREATE ${orReplace} ${entity} ${type};

But really I know very little about Spark. I'm not even sure if this ${...} syntax is part of Apache Spark itself or comes from some other tool.

nene avatar Nov 06 '24 13:11 nene

We're using ${...} as a plain string substitution in SparkSQL cells in Azure Synapse Analytics, so yes, it acts as a substitution for any string.

The additional whitespace looks like a result of nene's explanation that the formatter assumes it's an identifier, rather than a general purpose string replacement.

This is why whitespace isn't introduced after a . e.g.

${one}.${two}

whereas

${one}_${two}

becomes

${one} _ ${two}

One workaround would be to use/derive a "whole word" parameter instead of partial one e.g.

spark.conf.set("environment_name_sublayer", f"{environment_name}_sublayer")

followed by

create or replace view ${unity_catalog_name}.${environment_name_sublayer}.tablename 

which yields

CREATE    OR REPLACE VIEW ${unity_catalog_name}.${environment_name_sublayer}.tablename

amadeuspzs avatar Nov 14 '24 12:11 amadeuspzs