sql-metadata icon indicating copy to clipboard operation
sql-metadata copied to clipboard

Issues with aliases and 'with' clause while finding columns

Open Byunk opened this issue 2 years ago • 0 comments

Hello @macbre. I'm addressing the Attribute Errors we discussed before, but I've come across a new bug. This one appears to be related to issues with aliases and 'with' queries, and it seems it might require a significant number of changes to fix.

Testing query

WITH customer_total_return AS
  (SELECT sr_customer_sk AS ctr_customer_sk,
          sr_store_sk AS ctr_store_sk,
          sum(SR_FEE) AS ctr_total_return
   FROM store_returns,
        date_dim
   WHERE sr_returned_date_sk = d_date_sk
     AND d_year =2000
   GROUP BY sr_customer_sk,
            sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1,
     store,
     customer
WHERE ctr1.ctr_total_return >
    (SELECT avg(ctr_total_return)*1.2
     FROM customer_total_return ctr2
     WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  AND s_store_sk = ctr1.ctr_store_sk
  AND s_state = 'SD'
  AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100;

Running this query with #437 gives us a result that differs from what I'd expect. The parser doesn't store aliases for 'with' queries, so it can't expand 'ctr1' and 'ctr2' to 'customer_total_return' in the query.

Actual Result

{
    "group_by": [
        "sr_customer_sk",
        "sr_store_sk"
    ],
    "order_by": [
        "c_customer_id"
    ],
    "select": [
        "sr_customer_sk",
        "sr_store_sk",
        "SR_FEE",
        "c_customer_id"
    ],
    "where": [
        "sr_returned_date_sk",
        "d_date_sk",
        "d_year",
        "ctr1.ctr_total_return",
        "ctr1.ctr_store_sk",
        "ctr2.ctr_store_sk",
        "s_store_sk",
        "s_state",
        "ctr1.ctr_customer_sk",
        "c_customer_sk"
    ]
}

Expected Result

{
    "group_by": [
        "sr_customer_sk",
        "sr_store_sk"
    ],
    "order_by": [
        "c_customer_id"
    ],
    "select": [
        "sr_customer_sk",
        "sr_store_sk",
        "SR_FEE",
        "c_customer_id"
    ],
    "where": [
        "sr_returned_date_sk",
        "d_date_sk",
        "d_year",
        "s_store_sk",
        "s_state",
        "c_customer_sk"
    ]
}

Byunk avatar Oct 11 '23 14:10 Byunk