Compute unique values (aggregate) over cells and group
Data scientist feedback suggests adding something like
rf.agg(rf_agg_unique(rf.tile))
It should compute a single Row with an ArrayType col having all distinct values in all cells in the column.
Alternative might be rf.select(rf_unique(rf.tile)) which would transform a tile to ArrayType with only distinct values.
One possibility to work around this now might be df.select(rf_explode_tiles(tile).alias('cell_vals').select('cell_vals').distinct()
Other related discussions:
a method for getting unique from across ArrayType in spark: https://stackoverflow.com/questions/37801889/get-the-distinct-elements-of-an-arraytype-column-in-a-spark-dataframe
numpy unique https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html