sql-metadata icon indicating copy to clipboard operation
sql-metadata copied to clipboard

ctas with column name dist/sort key breaks parser.tables

Open tipanverella opened this issue 2 years ago • 0 comments

I am back! You folks do such good work, I have to keep pushing on your product.

Here is the issue,

I wrote the following 3 test:

def test_create_table():
    query = "CREATE TABLE my_table AS SELECT * FROM source_table"
    parser = Parser(query)
    assert set(parser.tables) == {"my_table", "source_table"}

def test_create_dist_table():
    query = (
        "CREATE TABLE my_table distkey(1) sortkey(1,3) AS SELECT "
        "col1, col2, col3 FROM source_table"
    )
    parser = Parser(query)
    assert set(parser.tables) == {"my_table", "source_table"}

def test_create_dist_column_name_table():
    query = (
        "CREATE TABLE my_table distkey(col1) sortkey(col1, col3) AS SELECT "
        "col1, col2, col3 FROM source_table"
    )
    parser = Parser(query)
    assert set(parser.tables) == {"my_table", "source_table"}

The first two passed, i.e. parser.tables recognizes tables inside of CTAS, even when one specifies distribution and sorting keys. However the last test fails:

========================================== FAILURES ==========================================
_____________________________ test_create_dist_column_name_table _____________________________

    def test_create_dist_column_name_table():
        query = (
            "CREATE TABLE my_table distkey(col1) sortkey(col1, col3) AS SELECT "
            "col1, col2, col3 FROM source_table"
        )
        parser = Parser(query)
>       assert set(parser.tables) == {"my_table", "source_table"}
E       AssertionError: assert {'col3', 'my_table', 'source_table', 'col1'} == {'my_table', 'source_table'}
E         Extra items in the left set:
E         'col1'
E         'col3'
E         Full diff:
E         - {'my_table', 'source_table'}
E         + {'col3', 'my_table', 'source_table', 'col1'}
E         ?  ++++++++                          ++++++++

test/test_getting_tables.py:681: AssertionError
================================== short test summary info ===================================
FAILED test/test_getting_tables.py::test_create_dist_column_name_table - AssertionError: assert {'col3', 'my_table', 'source_table', 'col1'} == {'my_table', 'sour...
=============================== 1 failed, 141 passed in 0.58s ================================

As you can see, referring to the columns by their names (which is good practice) trips up parser.tables.

Please let me know if there is any additional information I can provide to further clarify this issue. I am also will to help fix it. Like I said up top, you folks do good work. I will help you in any way I can.

tipanverella avatar Jan 31 '23 01:01 tipanverella