malloy
malloy copied to clipboard
Error using a dimension w/ a nested field reference in a join
What happens?
I have a Source with a nested field, and I create a dimension on that Source that references the nested field:
source: raw_vals is bigquery.table(`mytable`) extend {
dimension: myval is nested_field.nested_val
}
I create another Source and attempt to use this dimension as a join key:
source: grouped_tbl is raw_vals -> {
group_by: myval
} extend {
join_many: raw_vals on myval = raw_vals.myval
}
The SQL generated by this query fails to compile:
Unrecognized name: nested_field_0; Did you mean nested_field? at [13:43]
See the example in this file
To Reproduce
Run the example here: https://github.com/malloydata/malloy-samples/blob/a194dcea7c51d986b2d195729dbc6e32cb14b69e/bigquery/ga4/join_nest.malloy#L12-L15
In the malloy-samples repo in the ce/unnest_join branch
OS:
macOS
Malloy Client:
Local VSCode
Malloy Client Version:
v0.3.1699380172
Database Connection:
BigQuery
Related: though this issue is probably easier to fix than mine https://github.com/malloydata/malloy/issues/1482
It is the same bug. The problem is that the join predicate implies an UNNEST and that UNNEST has to happen before the the table is joined in. I don't have an elegant solution to solve this just yet.
So this is what the SQL looks like when you try and join on a nested field. The problem is that the joins are all performed in order.
When we get to the on clause, a.value is not yet defined.
SELECT
....
FROM base
LEFT JOIN table_with_nest ON base.id = a.value
LEFT JOIN UNNEST (table_with_nest.some_array) as a(value)
We could, when unnesting, unnest all the related data into a single alias.
SELECT
....
FROM base
LEFT JOIN (
SELECT t.*, nest.nest_value as unique_name FROM table_with_nest t
LEFT JOIN UNNEST (table_with_nest.some_array) as nest(nest_value)
) as unnested_table ON base.id =unnested_table.unique_name
We'd have to see the performance implications of this. I could see where it could be quite slow if it subverts predicate pushdown.
It may be better to just detect the error and have the user do this themselves.