sqlglot
sqlglot copied to clipboard
Oracle to Spark SQL. Failure to unnest queries during optimisation
Followup from slack message https://tobiko-data.slack.com/archives/C0448SFS3PF/p1715277495692029?thread_ts=1715277495.692029&cid=C0448SFS3PF
Fully reproducible code snippet
from sqlglot import parse_one
from sqlglot.optimizer import optimize
query = """
select
pt.c2,
t2.person_id AS person_id,
(
SELECT
MAX(t3.full_name) AS _col_0
FROM
db.third_table AS t3
WHERE
t3.person_id = t2.person_id
AND t3.effective_start_date <= TRUNC(t2.creation_date)
AND t3.effective_end_date >= TRUNC(t2.creation_date)
) AS person_full_name
from
parent_table as pt,
second_table as t2
where
pt.c1 = 'something'
and pt.second_col_id = t2.second_col_id
"""
tree = parse_one(query, dialect="oracle")
optimize(tree)
print(tree.sql("spark", pretty=True))
Python version: 3.11.2 SQLGlot version: 23.14.0
Expected output Nested query to be moved to a CTE and referenced to the parent select under a join clause.
The flow goes through when the inner query has a single filter in the where clause. But fails with more than one.