simple-ddl-parser icon indicating copy to clipboard operation
simple-ddl-parser copied to clipboard

Incorrect JSON wrapper break JQ

Open andy65535 opened this issue 3 years ago • 3 comments

Describe the bug

Commonly used Linux command line JQ tool doesn't work with JSON which prepared by sdp, because jq expect that wrapper is double quote. Example of real GCP BigQuery DDL:

DDL:

$ cat sample_ddl.sql

CREATE  TABLE `dataset.table`
(
    Updated_Date           DATE   NOT NULL OPTIONS (description =Description1),
    Lead_Store_Id          INT64  NOT NULL OPTIONS (description =Description2),
)
PARTITION BY LeadCreatedDate
CLUSTER BY Lead_Store_Id

OUTPUT:

$ sdp -o bigquery --no-dump sample_ddl.sql
Start parsing file sample_ddl.sql

File with result was saved to >> schemas folder
[{'alter': {},
  'checks': [],
  'cluster_by': ['Lead_Store_Id'],
  'columns': [{'check': None,
               'default': None,
               'name': 'Updated_Date',
               'nullable': False,
               'options': [{'description': '=Description1'}],
               'references': None,
               'size': None,
               'type': 'DATE',
               'unique': False},
              {'check': None,
               'default': None,
               'name': 'Lead_Store_Id',
               'nullable': False,
               'options': [{'description': '=Description2'}],
               'references': None,
               'size': None,
               'type': 'INT64',
               'unique': False}],
  'constraints': {'checks': None, 'references': None, 'uniques': None},
  'index': [],
  'partition_by': {'columns': ['LeadCreatedDate'], 'type': None},
  'partitioned_by': [],
  'primary_key': [],
  'schema': '`dataset',
  'table_name': 'table`',
  'tablespace': None}]

JQ run:

$ sdp -o bigquery --no-dump sample_ddl.sql |  egrep -v 'Start parsing file |File with result was saved to ' | jq .
parse error: Invalid numeric literal at line 2, column 10

Followed JSON is OK:

[{"alter": {},
  "checks": [],
  "cluster_by": ["Lead_Store_Id"],
  "columns": [{"check": "None",
               "default": "None",
               "name": "Updated_Date",
               "nullable": "False",
               "options": [{"description": "=Description1"}],
               "references": "None",
               "size": "None",
               "type": "DATE",
               "unique": "False"},
              {"check": "None",
               "default": "None",
               "name": "Lead_Store_Id",
               "nullable": "False",
               "options": [{"description": "=Description2"}],
               "references": "None",
               "size": "None",
               "type": "INT64",
               "unique": "False"}],
  "dataset": "`dataset",
  "index": [],
  "partition_by": {"columns": ["LeadCreatedDate"], "type": "None"},
  "partitioned_by": [],
  "primary_key": [],
  "table_name": "table`",
  "tablespace": "None"}]

Here is few issues:

  1. JQ expect that all values wrapped (None, False, True - as well)

  2. JQ expect that wrapper will be double quote, not single quote. JSON validator https://jsonformatter.curiousconcept.com/ said that points 1 and 2 are RFC violation.

  3. Incorrect parsing if table name wrapped by back quote: "CREATE TABLE `dataset.table`". As you can see from output, parser keeped back quote at the beginning (but sometimes at the ending) of the table name:

 'schema': '`dataset',

andy65535 avatar Mar 11 '22 13:03 andy65535

@andy65535 Hi! This is because in stdout output is not JSON, but python dict (and because of that you see None, in JSON it must be null). If you want JSON - remove flag --no-dump and it will dump json to the file. I didn't thought about use case that someone need json in stdout :) I will add in next release flag --json to get output in stdout as json, not Python object.

xnuinside avatar Mar 11 '22 19:03 xnuinside

@xnuinside thank you for replay! I expected that stdout will be the same as output without --no-dump or with -t parameter. I use stdout as far I need specify another filename instead of default behavior (or -t what create folder for outputs). Anyway, I removed --no-dump and JSON now formatted as expected. But point 3 about back quotes from my initial report still in place:

CREATE TABLE `dataset.table`

parser leave back quotes and final JSON become:

...
    "table_name": "table`",
...                     ^
    "dataset": "`dataset"
...             ^

andy65535 avatar Mar 26 '22 11:03 andy65535

@andy65535 hi, I'm ready to kill myself for the my previous answer to you! first of all ))) you can get json output with python code )) (I really need spend time on documentation improvement) - you can do it like this (sample added in README): https://github.com/xnuinside/simple-ddl-parser#get-output-in-json

second - to get output without artifacts like ` at the end you can use flag "normalize_names=True" like here: https://github.com/xnuinside/simple-ddl-parser/blob/07f8b1f5c4c6b4aaf15a0fce4ce29fc8b213757a/tests/non_statement_tests/test_common.py#L49

so if you will run parser like:

result = DDLParser(ddl, normalize_names=True).run(json_dump=True, group_by_type=True)

You will get required output.

by default (without flag normalize_names=True) parser leave as is all symbols to get possible reconstruct DDL from parser output

Again, one more time SORRY for my wrong answer )

With CLI is not possible to use normalize_names flag - I will work in next release to update the CLI to keep all arguments & settings for parser

xnuinside avatar Mar 28 '22 14:03 xnuinside