sql-formatter icon indicating copy to clipboard operation
sql-formatter copied to clipboard

[FORMATTING] Space between type and options removed in nested BigQuery fields

Open Ewan-Keith opened this issue 2 years ago • 1 comments

When formatting BigQuery DDL statement with OPTIONS on a Struct field, any space between the type definition and the Options definition is removed, resulting in an invalid statement.

Input data

CREATE TABLE `project.dataset.format_test`
(
 str STRING OPTIONS(description="A string field."),
 int INT64  OPTIONS(description="An int64 field."),
 record STRUCT<nested INT64 OPTIONS(description="a struct field")> OPTIONS(description="a struct")
)

Expected Output

CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64 OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )

Actual Output

CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )

Usage

To run this I've stored the input in a file called example-sql.sql and am running:

> sql-formatter --v
12.2.3

> sql-formatter -l bigquery example-sql.sql
CREATE TABLE
  `project.dataset.format_test` (
    str STRING OPTIONS(description = "A string field."),
    int INT64 OPTIONS(description = "An int64 field."),
    record STRUCT<nested INT64OPTIONS(description ="a struct field")> OPTIONS(description = "a struct")
  )

Ewan-Keith avatar Jul 10 '23 14:07 Ewan-Keith

Thanks for reporting. Definitely a bug.

I remember that the part that deals with this type-parameter formatting was pretty ugly and error-prone. Perhaps there's a simple fix I can make, perhaps not. More likely not. Will look into this.

In the mean time, perhaps you'd like to try out a new SQL formatting tool that I've written, which specifically includes full BigQuery support: https://github.com/nene/prettier-plugin-sql-cst

It formats the SQL somewhat differently, but at least it wont output invalid code. For your SQL it produces the following:

CREATE TABLE `project.dataset.format_test` (
  str STRING OPTIONS (description = "A string field."),
  int INT64 OPTIONS (description = "An int64 field."),
  record STRUCT<nested INT64 OPTIONS (description = "a struct field")> OPTIONS (
    description = "a struct"
  )
);

I'm not really sure whether it would be better to format options as OPTIONS(...) or OPTIONS (...). Or which style is more common in BigQuery community. Open to suggestions.

nene avatar Jul 11 '23 10:07 nene