how to delete 'ColumnarToRow' node in some physical plan when using geospatial predicates?

Open freamdx opened this issue 7 months ago • 1 comments

create three type geospatial tables firstly

'test' table using parquet
create table test (h3 int, geom geometry) using parquet;
'geo_test' table using geoparquet create table geo_test (h3 int, geom geometry) using geoparquet;
'iceberg_test' table using iceberg create table iceberg_test (h3 int, geom geometry) using iceberg; create iceberg table with geometry column, refer patch: (https://github.com/freamdx/iceberg/commit/929dfae730d41516c77adf6801da99a01e410810)

then, explain spatial query sql 'select ... from ... where st_intersects(...)'

how to delete 'ColumnarToRow' node in some physical plan? when SpatialIndex is disabled and BatchScan is true, how to support columnar processing with predicates? maybe some predicates should support vectorized computing...

May 27 '25 07:05 freamdx

Spark is a row-based compute engine. The reason why you see ColumnarToRow is that Spark has vectorized parquet / iceberg reader. Once the vectorized reader finishes, Spark has to convert it back to row-based layout. The reason why you didn't see it in geoparquet reader, it is because Sedona's GeoParquet reader has not support vectorized read yet.

May 27 '25 20:05 jiayuasu