rbc
rbc copied to clipboard
Example: histogram UDTF
Take a sparse histogram input of a given dimensionality and output a single column dense grid, i.e. input (x, y) => count(*) to a dense array/raster of count(*), which would have 0 values for where there were missing input values.
For example:
SQL('SELECT AVG(out0) FROM TABLE(dense_histogram_udtf(CURSOR(select cast((lon + 180.0) * 2.777777778 as int) as lon_bin, cast((lat + 90.0) * 2.7777777778 as int) as lat_bin, count(*) as n from tweets group by lon_bin, lat_bin), 0, 1000, 1000, 1000000))')
Needs multiple columnar inputs, user-supplied constant output sizing (as the number of output bins will be the x_bins * y_bins, which could be less or greater than the cardinality of the input query result set, and composable output such that we can project or do analytics on the result without materializing to a table