sqlglot icon indicating copy to clipboard operation
sqlglot copied to clipboard

feat(optimizer)!: Expand stars on BigQuery's tbl.struct_col.* selections

Open VaggelisD opened this issue 1 year ago • 1 comments

Fixes #3484

BigQuery supports star expansion such as foo.bar.* where foo is the table and bar is a struct. This is essentially a flatten operation from the specified struct and down. For example, if bar has a nested struct baz, then foo.bar.baz.* would cause a flatten from baz downwards.

In order to replicate this behavior, a DFS traversal starting from the top-level struct gathers the non-struct fields and enqueues the nested structs (if any). At every point, a name / fully qualified path is kept and prefixes the fields at that level.

Note: If any nested field is not aliased, then the star expansion is stopped and the star` is preserved; This is because BigQuery will alias these fields dynamically so it's not possible to qualify/target these fields statically afaict.

An example:

Schema
-----------
foo: 'bar' STRUCT<'f0' STRING, 'baz' STRUCT<'f1' STRING>>

Query
--------
SELECT foo.bar.* FROM foo;

Star Expansion
------------
1. Enqueue the top level struct which is bar. Current path is initiazed as empty string
2. Pop the stack's top item and (1) qualify it with the current path name if it's a field or (2) append it to the stack if it's a nested struct.
3. Pop the next item if any, repeat 2

Result
-------
SELECT foo.bar.f0, foo.bar.baz.f1 FROM foo;

Docs

VaggelisD avatar May 22 '24 13:05 VaggelisD

Changed to DFS so as to match BQ's flatten order (left to right), just in case the previous order added side effects

VaggelisD avatar May 22 '24 14:05 VaggelisD