geoarrow-rs icon indicating copy to clipboard operation
geoarrow-rs copied to clipboard

Mirror/surface arro3 equivalent parquet options

Open H-Plus-Time opened this issue 8 months ago • 6 comments

It's admittedly a lot of boilerplate, so an alternative to just duplicating them (and the writer option ingestion logic) would be ideal.

That being said, I might have missed something obvious and it's in fact already possible to do:

  1. Get the geoparquet-specific metadata out as something that can be shoved into the metadata property of an arro3 table.
  2. Feed that into arro3.io.write_parquet (or the ParquetWriter class, to each their own), specifying the (innumerable) parquet-specific options there.

(workarounds involving pyogrio are a bust due to uint64 (and apparently all uints) roundtripping limitations).

H-Plus-Time avatar Mar 26 '25 08:03 H-Plus-Time

Are you specifically thinking about writing here?

As you might've seen in https://github.com/kylebarron/arro3/pull/313 I'm prototyping expanded APIs for reading Parquet. By exposing the full parquet crate API, and in particular the predicate and RowFilter APIs, you could in theory expose an efficient GeoParquet reader in pure Python.

It's admittedly a lot of boilerplate, so an alternative to just duplicating them (and the writer option ingestion logic) would be ideal.

You can see in pyo3-arrow and pyo3-object_store how I've written Python bindings once to expose to multiple Python libraries. But I don't think Parquet is of that much value to go through the headache of having a pyo3-parquet. So I think easier to just deal with the duplication across arro3 and geoarrow.

kylebarron avatar Mar 26 '25 14:03 kylebarron

Oops, yes, writing. In that case I'll get a PR up for that.

H-Plus-Time avatar Mar 26 '25 22:03 H-Plus-Time

Maybe before spending the time on a fleshed out PR, can you more specifically describe the functionality you want?

kylebarron avatar Mar 26 '25 22:03 kylebarron

Right, ok - I personally need compression and (less critical) parquet format version permitted in the writer options, so:

geoarrow.rust.io.write_parquet(
  target_path, encoding, compression="zstd", parquet_format_version="parquet_2_0"
)

H-Plus-Time avatar Mar 26 '25 23:03 H-Plus-Time

Maybe let's start with just those two exposed here?

kylebarron avatar Mar 28 '25 19:03 kylebarron

Right, ok - I personally need compression and (less critical) parquet format version permitted in the writer options, so:

geoarrow.rust.io.write_parquet( target_path, encoding, compression="zstd", parquet_format_version="parquet_2_0" )

These are now exposed in the public Python GeoParquet writer API as of the 0.4 Python release.

kylebarron avatar Jul 03 '25 19:07 kylebarron