sqlglot icon indicating copy to clipboard operation
sqlglot copied to clipboard

Optimizer fails in some specific cases

Open Bladieblah opened this issue 6 months ago • 1 comments

Sqlglot version: 26.25.3

Reading dialect: BigQuery Writing dialect: N/A

Fully reproducible code snippet

from loguru import logger
from sqlglot import Expression
from sqlglot.dialects.dialect import Dialect
from sqlglot.errors import OptimizeError
from sqlglot.optimizer.qualify_columns import (
  pushdown_cte_alias_columns,
  qualify_columns,
  quote_identifiers,
  validate_qualify_columns,
)
from sqlglot.optimizer.qualify_tables import qualify_tables
from sqlglot.schema import ensure_schema


def qualify(expression: Expression, dialect="bigquery"):
  schema = ensure_schema(None, dialect=dialect)
  expression = qualify_tables(expression, schema=schema)

  if Dialect.get_or_raise(dialect).PREFER_CTE_ALIAS_COLUMN:
    expression = pushdown_cte_alias_columns(expression)

  try:
    expression = qualify_columns(
      expression,
      schema,
      expand_alias_refs=True,
      expand_stars=True,
      infer_schema=None,
    )
  except OptimizeError as e:
    logger.error(f"Exception while qualifying query: [{e}].")
    raise

  expression = quote_identifiers(expression, dialect=dialect, identify=True)
  validate_qualify_columns(expression)

  return expression

ast = qualify(sqlglot.parse_one(sql, dialect="bigquery"))

This sql code fails:

WITH some_cte AS (
    SELECT 
      foo,
      bar
    FROM `database.schema.table`
)

SELECT
  *,
  TO_JSON_STRING(some_cte) AS serialized_row
FROM some_cte;

With the error message

File "(...)/sqlglot/optimizer/qualify_columns.py", line 126, in validate_qualify_columns
  sqlglot.errors.OptimizeError: Column '"some_cte"' could not be resolved

And here is a slightly different example that does work:

WITH some_cte AS (
    SELECT 
      *
    FROM `database.schema.table`
)

SELECT
  *,
  TO_JSON_STRING(some_cte) AS serialized_row
FROM some_cte;

Both examples are valid in bigquery.

Bladieblah avatar Jun 11 '25 09:06 Bladieblah

Thanks, this is a legit bug. An easier way to see what's happening is:

from sqlglot import parse_one
from sqlglot.optimizer.qualify import qualify

ast = parse_one("with t as (select 1) select to_json_string(t) from t", dialect="bigquery")
qualify(ast, dialect="bigquery")
# sqlglot.errors.OptimizeError: Column '"t"' could not be resolved

Both this and your example use a table as a column, which is valid in BigQuery because it treats it as a struct with the table's columns for its fields. Sqlglot's optimizer doesn't fully support this behavior today, but we'll take a look.

georgesittas avatar Jun 11 '25 09:06 georgesittas