sedona icon indicating copy to clipboard operation
sedona copied to clipboard

Raster Data Types in dataframe columns

Open normanb opened this issue 4 months ago • 2 comments

I have an interesting problem in that for some functions the dataframe column type is being set as udt when saved to a delta table and for other functions it is set as binary. I would prefer when the column is a geotiff that the column type is set as udt so that it is a usable GridCoverage2d.

I am running on Databricks and Sedona is configured and working well. I am using DBT to define the models.

RS_UnionAgg results in a column type udt but RS_ReprojectMatch does not.

I am using RS_AsGeotiff as follows;

RS_AsGeotiff(RS_ReprojectMatch(vals.raster, ref)) as values

Which results in a column of type binary which is to be expected but I want RS_ReprojectMatch to set the column type as udt.

If I don't wrap RS_ReprojectMatch is RS_AsGeotiff then I get the following error;

Database Error org.apache.spark.sql.sedona_sql.UDT.RasterUDT$.<init>()

Which is probably close to what I want without the error :) (as per RS_UnionAgg that gives a udt column type).

I am using RS_UnionAgg after calling RS_TileExplode and that defines the udt column type appropriately for a column.

My workaround is to convert the column immediately after loading;

df= df_tmp.withColumn("values_raster", F.expr("RS_FromGeoTiff(values)")).drop("values")

I would like to avoid the workaround but I could add it as a post hook I guess to convert the column type.

normanb avatar Sep 12 '25 23:09 normanb