diving-into-pygeoapi icon indicating copy to clipboard operation
diving-into-pygeoapi copied to clipboard

add next generation formats to exercises

Open tomkralidis opened this issue 6 months ago • 6 comments

Add content to exercises for data publishing.

GeoParquet Zarr FlatGeoBUF PyArrow

tomkralidis avatar May 30 '25 13:05 tomkralidis

For geoparquet additional dependencies are needed on gdal, what would be the best approach?

pvgenuchten avatar May 31 '25 05:05 pvgenuchten

Think for pygeoapi only pyarrow and deps like GeoPandas are required. But the latter depends on GDAL. Looks like pyarrow is not in the pygeoapi Docker image? (as requirements-provider.txt is not included). I hope this issue is not in the wrong repo: in the context of "Doing Geospatial in Python" WS meeting yesterday, we talked having "new/upcoming" formats included, if feasible, after investigation. Ok, I see there is: https://github.com/geopython/geopython-workshop/issues/193 ...Open for suggestions...

justb4 avatar May 31 '25 15:05 justb4

Did some research, accessing (Geo)Parquet with (geo)arrow. See https://github.com/justb4/parquet-research , most in the README.md.

Findings a.o.

  • GDAL not required
  • pygeoapi Dockerfile only needs to include python3-arrow and GeoPandas to support the Parquet Provider. Pandas is already in the Docker Image. Not sure what GeoPandas adds, 3.5 MB Python, but also depends on Fiona (?).
  • We held off including pyarrow as it would add around 110MB. See PR Review using wheels. Maybe python3-arrow is more compact...
  • so we could make a simple exercise where we add a Parquet Provider
  • Overturemaps Python CLI allows quickly downloading data by BBOX, e.g. from Mostar centre.
  • the Python programming examples are more for https://github.com/geopython/geopython-workshop/issues/193

justb4 avatar Jun 01 '25 15:06 justb4

More research: geoarrow-rs (Rust) with Python bindings by @kylebarron Kyle Barron et al. looks like a lighter-weight modular alternative. We already discussed with him while working on the pygeoapi parquet Provider PR. Now project is more mature. Footprint about 28M plus 6.8M for arro3 minimal Apache arrow dep package. Other libs are standard and/or included in pygeoapi deps like pyproj.

du -sh /Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/*
6.8M	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/arro3
 16K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/arro3_core-0.5.1.dist-info
308K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/certifi
 24K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/certifi-2025.4.26.dist-info
928K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/charset_normalizer
 60K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/charset_normalizer-3.4.2.dist-info
 28M	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/geoarrow     <=========
 16K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/geoarrow_rust_io-0.3.0.dist-info
648K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/idna
 28K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/idna-3.10.dist-info
 12M	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/pip
104K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/pip-25.1.1.dist-info
 18M	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/pyproj
 68K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/pyproj-3.7.1.dist-info
472K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/requests
 36K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/requests-2.32.3.dist-info
984K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/urllib3
 28K	/Users/just/.pyenv/versions/geoarrow-rs/lib/python3.12/site-packages/urllib3-2.4.0.dist-info

justb4 avatar Jun 02 '25 14:06 justb4

2025-07-01

  • Zarr: @tomkralidis
  • GeoParquet: @justb4

tomkralidis avatar Jul 01 '25 14:07 tomkralidis

By the way, I just released a new version of the Python bindings to geoarrow-rs, v0.4: https://geoarrow.org/geoarrow-rs/python/latest/

kylebarron avatar Jul 01 '25 16:07 kylebarron