Mirror/surface arro3 equivalent parquet options
It's admittedly a lot of boilerplate, so an alternative to just duplicating them (and the writer option ingestion logic) would be ideal.
That being said, I might have missed something obvious and it's in fact already possible to do:
- Get the geoparquet-specific metadata out as something that can be shoved into the metadata property of an arro3 table.
- Feed that into arro3.io.write_parquet (or the ParquetWriter class, to each their own), specifying the (innumerable) parquet-specific options there.
(workarounds involving pyogrio are a bust due to uint64 (and apparently all uints) roundtripping limitations).
Are you specifically thinking about writing here?
As you might've seen in https://github.com/kylebarron/arro3/pull/313 I'm prototyping expanded APIs for reading Parquet. By exposing the full parquet crate API, and in particular the predicate and RowFilter APIs, you could in theory expose an efficient GeoParquet reader in pure Python.
It's admittedly a lot of boilerplate, so an alternative to just duplicating them (and the writer option ingestion logic) would be ideal.
You can see in pyo3-arrow and pyo3-object_store how I've written Python bindings once to expose to multiple Python libraries. But I don't think Parquet is of that much value to go through the headache of having a pyo3-parquet. So I think easier to just deal with the duplication across arro3 and geoarrow.
Oops, yes, writing. In that case I'll get a PR up for that.
Maybe before spending the time on a fleshed out PR, can you more specifically describe the functionality you want?
Right, ok - I personally need compression and (less critical) parquet format version permitted in the writer options, so:
geoarrow.rust.io.write_parquet(
target_path, encoding, compression="zstd", parquet_format_version="parquet_2_0"
)
Maybe let's start with just those two exposed here?
Right, ok - I personally need compression and (less critical) parquet format version permitted in the writer options, so:
geoarrow.rust.io.write_parquet( target_path, encoding, compression="zstd", parquet_format_version="parquet_2_0" )
These are now exposed in the public Python GeoParquet writer API as of the 0.4 Python release.