sql-metadata
sql-metadata copied to clipboard
ctas with column name dist/sort key breaks parser.tables
I am back! You folks do such good work, I have to keep pushing on your product.
Here is the issue,
I wrote the following 3 test:
def test_create_table():
query = "CREATE TABLE my_table AS SELECT * FROM source_table"
parser = Parser(query)
assert set(parser.tables) == {"my_table", "source_table"}
def test_create_dist_table():
query = (
"CREATE TABLE my_table distkey(1) sortkey(1,3) AS SELECT "
"col1, col2, col3 FROM source_table"
)
parser = Parser(query)
assert set(parser.tables) == {"my_table", "source_table"}
def test_create_dist_column_name_table():
query = (
"CREATE TABLE my_table distkey(col1) sortkey(col1, col3) AS SELECT "
"col1, col2, col3 FROM source_table"
)
parser = Parser(query)
assert set(parser.tables) == {"my_table", "source_table"}
The first two passed, i.e. parser.tables recognizes tables inside of CTAS, even when one specifies distribution and sorting keys. However the last test fails:
========================================== FAILURES ==========================================
_____________________________ test_create_dist_column_name_table _____________________________
def test_create_dist_column_name_table():
query = (
"CREATE TABLE my_table distkey(col1) sortkey(col1, col3) AS SELECT "
"col1, col2, col3 FROM source_table"
)
parser = Parser(query)
> assert set(parser.tables) == {"my_table", "source_table"}
E AssertionError: assert {'col3', 'my_table', 'source_table', 'col1'} == {'my_table', 'source_table'}
E Extra items in the left set:
E 'col1'
E 'col3'
E Full diff:
E - {'my_table', 'source_table'}
E + {'col3', 'my_table', 'source_table', 'col1'}
E ? ++++++++ ++++++++
test/test_getting_tables.py:681: AssertionError
================================== short test summary info ===================================
FAILED test/test_getting_tables.py::test_create_dist_column_name_table - AssertionError: assert {'col3', 'my_table', 'source_table', 'col1'} == {'my_table', 'sour...
=============================== 1 failed, 141 passed in 0.58s ================================
As you can see, referring to the columns by their names (which is good practice) trips up parser.tables.
Please let me know if there is any additional information I can provide to further clarify this issue. I am also will to help fix it. Like I said up top, you folks do good work. I will help you in any way I can.