Operating on arrays of struct<int64, int64> in BigQuery
As a substitute for a map type, which is missing in BigQuery, we've implemented maps as arrays of <key, value> structs. Some of these keys and values are int64-typed.
Using Ibis, we can't flatten these arrays (#1146) and we can't write UDFs that accept these arrays as inputs (#1478; discussion in ibis-project/ibis#1469) to access members, because Ibis rejects int64s appearing anywhere in a BigQuery UDF signature. (We wrote a sql UDF for this use case outside Ibis.)
Trying to cast the struct members to another type (floats or strings) so that we can pass them to a UDF that Ibis accepts will fail on execution; BigQuery complains like:
BadRequest: 400 Casting between arrays with incompatible element types is not supported: Invalid cast from ARRAY<STRUCT<key INT64, value INT64>> to ARRAY<STRUCT<key STRING, value STRING>> at [17:33]
I think we're also prevented from doing anything baroque like calling TO_JSON_STRING on the column and having the UDF accept the JSON blob as a string because we can't call arbitrary BigQuery functions, though I see that's contemplated in the roadmap.
I would like to be able to define UDFs to operate on integer types. If you create a Javascript UDF signed to accept an int64 in BigQuery, BigQuery will actually pass in a string-encoded number. Maybe Ibis could accept a type named integer_string for the UDF definition to allow UDFs to work on these types while ensuring that a user's bought into this behavior?
Interesting. I wasn't aware that BigQuery supported integer return types. In any case, we could re-enable int64 usage, with the expectation that users know about the pitfalls of using them with BQ UDFs, which I will happily document and call out in a large red box.
Using Ibis, we can't flatten these arrays (#1146)
Just to confirm - the ibis unnest API doesn't work with a biguery backend? @cpcloud
@koshy1123 That is correct. The ibis implementation assumes that unnest can be used in the SELECT position. We did it this way to avoid blocking every other backend that we supported at the time which all support writing SELECT UNNEST(...) or its equivalent (duckdb, postgres, clickhouse, and pyspark).
In an ideal world BigQuery would allow SELECT UNNEST(...) and then the ibis implementation would be a few lines of code.
A slightly less desirable way, but that might unblock users of this library would be to implement support for the .sql API of TableExpr, which would at least allow users to write something like
expr = t.alias("my_t").sql("SELECT my_t.*, point.x, point.y FROM my_t CROSS JOIN UNNEST(points) AS point")
Oooh - interesting! Thanks for the insight @cpcloud.