Raster Data Types in dataframe columns
I have an interesting problem in that for some functions the dataframe column type is being set as udt when saved to a delta table and for other functions it is set as binary. I would prefer when the column is a geotiff that the column type is set as udt so that it is a usable GridCoverage2d.
I am running on Databricks and Sedona is configured and working well. I am using DBT to define the models.
RS_UnionAgg results in a column type udt but RS_ReprojectMatch does not.
I am using RS_AsGeotiff as follows;
RS_AsGeotiff(RS_ReprojectMatch(vals.raster, ref)) as values
Which results in a column of type binary which is to be expected but I want RS_ReprojectMatch to set the column type as udt.
If I don't wrap RS_ReprojectMatch is RS_AsGeotiff then I get the following error;
Database Error org.apache.spark.sql.sedona_sql.UDT.RasterUDT$.<init>()
Which is probably close to what I want without the error :) (as per RS_UnionAgg that gives a udt column type).
I am using RS_UnionAgg after calling RS_TileExplode and that defines the udt column type appropriately for a column.
My workaround is to convert the column immediately after loading;
df= df_tmp.withColumn("values_raster", F.expr("RS_FromGeoTiff(values)")).drop("values")
I would like to avoid the workaround but I could add it as a post hook I guess to convert the column type.