sedona
sedona copied to clipboard
how to delete 'ColumnarToRow' node in some physical plan when using geospatial predicates?
- create three type geospatial tables firstly
- 'test' table using parquet
create table test (h3 int, geom geometry) using parquet; - 'geo_test' table using geoparquet
create table geo_test (h3 int, geom geometry) using geoparquet; - 'iceberg_test' table using iceberg
create table iceberg_test (h3 int, geom geometry) using iceberg;create iceberg table with geometry column, refer patch: (https://github.com/freamdx/iceberg/commit/929dfae730d41516c77adf6801da99a01e410810)
- then, explain spatial query sql 'select ... from ... where st_intersects(...)'
- how to delete 'ColumnarToRow' node in some physical plan? when SpatialIndex is disabled and BatchScan is true, how to support columnar processing with predicates? maybe some predicates should support vectorized computing...
Spark is a row-based compute engine. The reason why you see ColumnarToRow is that Spark has vectorized parquet / iceberg reader. Once the vectorized reader finishes, Spark has to convert it back to row-based layout. The reason why you didn't see it in geoparquet reader, it is because Sedona's GeoParquet reader has not support vectorized read yet.