sqlglot
sqlglot copied to clipboard
Optimizer fails in some specific cases
Sqlglot version: 26.25.3
Reading dialect: BigQuery Writing dialect: N/A
Fully reproducible code snippet
from loguru import logger
from sqlglot import Expression
from sqlglot.dialects.dialect import Dialect
from sqlglot.errors import OptimizeError
from sqlglot.optimizer.qualify_columns import (
pushdown_cte_alias_columns,
qualify_columns,
quote_identifiers,
validate_qualify_columns,
)
from sqlglot.optimizer.qualify_tables import qualify_tables
from sqlglot.schema import ensure_schema
def qualify(expression: Expression, dialect="bigquery"):
schema = ensure_schema(None, dialect=dialect)
expression = qualify_tables(expression, schema=schema)
if Dialect.get_or_raise(dialect).PREFER_CTE_ALIAS_COLUMN:
expression = pushdown_cte_alias_columns(expression)
try:
expression = qualify_columns(
expression,
schema,
expand_alias_refs=True,
expand_stars=True,
infer_schema=None,
)
except OptimizeError as e:
logger.error(f"Exception while qualifying query: [{e}].")
raise
expression = quote_identifiers(expression, dialect=dialect, identify=True)
validate_qualify_columns(expression)
return expression
ast = qualify(sqlglot.parse_one(sql, dialect="bigquery"))
This sql code fails:
WITH some_cte AS (
SELECT
foo,
bar
FROM `database.schema.table`
)
SELECT
*,
TO_JSON_STRING(some_cte) AS serialized_row
FROM some_cte;
With the error message
File "(...)/sqlglot/optimizer/qualify_columns.py", line 126, in validate_qualify_columns
sqlglot.errors.OptimizeError: Column '"some_cte"' could not be resolved
And here is a slightly different example that does work:
WITH some_cte AS (
SELECT
*
FROM `database.schema.table`
)
SELECT
*,
TO_JSON_STRING(some_cte) AS serialized_row
FROM some_cte;
Both examples are valid in bigquery.
Thanks, this is a legit bug. An easier way to see what's happening is:
from sqlglot import parse_one
from sqlglot.optimizer.qualify import qualify
ast = parse_one("with t as (select 1) select to_json_string(t) from t", dialect="bigquery")
qualify(ast, dialect="bigquery")
# sqlglot.errors.OptimizeError: Column '"t"' could not be resolved
Both this and your example use a table as a column, which is valid in BigQuery because it treats it as a struct with the table's columns for its fields. Sqlglot's optimizer doesn't fully support this behavior today, but we'll take a look.