geoarrow-rs icon indicating copy to clipboard operation
geoarrow-rs copied to clipboard

geoarrow::io::geojson fails with very sparse columns

Open JosiahParry opened this issue 11 months ago • 1 comments

Reading the attached file using the geojson reader fails when using a batch size < 6807. I believe this is because the column man_made has 1 non-null value.

osm-edinburgh-central.geojson.zip

It may be nice to have something similar to readr::read_csv()'s guess_max which can be used to specify how far to scan the file to guess the column types. This would be similar to polar's infer_schema_length.

Additionally, it would be nice if we could fallibly skip what was failed to be read.

JosiahParry avatar Dec 27 '24 19:12 JosiahParry

GeoJSON is probably the weakest reader we have (aside from PostGIS), and essentially the only reader left where we have a pure-geozero implementation.

Ideally we'd be able to reuse arrow-json, but I think that will fail on the geometry column, as it doesn't currently support union types I don't think.

kylebarron avatar Dec 28 '24 05:12 kylebarron