duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

Support SRS for geoparquet output

Open palmerj opened this issue 5 months ago • 6 comments

Hi would it be possible to support setting the SRS output for a parquet file so the JSON metadata->geo->crs key is populated?

I see this comment that looks promising and would provide a nice way to do it:

COPY (
    SELECT ST_SetSRID(ST_Point(1434967.0, 15026561.0), 'epsg:2193')
) TO 'geotest.parquet'
WITH (
    FORMAT 'PARQUET',
    COMPRESSION zstd,
    COMPRESSION_LEVEL 9,
    ROW_GROUP_SIZE 500
);

or maybe an SRS explicit option:

COPY (
    SELECT ST_Point(1434967.0, 15026561.0)
) TO 'geotest.parquet'
WITH (
    FORMAT 'PARQUET',
    COMPRESSION zstd,
    COMPRESSION_LEVEL 9,
    ROW_GROUP_SIZE 500,
    SRS 'EPSG:2193',
);

Another option, which might be easier to implement, is to compile in support for the GDAL geoparquet driver. That driver already supports setting the CRS key in the JSON metadata of the parquet file.

COPY (
    SELECT ST_SetSRID(ST_Point(1434967.0, 15026561.0), 'epsg:2193')
) TO 'geotest.parquet'
WITH (
    FORMAT gdal,
    DRIVER 'parquet',
    LAYER_CREATION_OPTIONS ('WRITE_COVERING_BBOX=YES' , 'COMPRESSION=ZSTD')
    SRS 'EPSG:2193',
);

Many thanks!

palmerj avatar Jul 17 '25 19:07 palmerj

@Maxxen sorry to ask, do you have any view on this, or is it planned to be fixed in the roadmap?

palmerj avatar Jul 28 '25 11:07 palmerj

Hello! Im currently overhauling how types work in general in DuckDB, and how the extension types and DuckDB GEOMETRY type works in particular, with the goal of being able to attach extra metadata (such as CRS) to a column, which will enable passing SRS/CRS information through and from external formats automatically. As soon as that is done I'll make sure it works for GeoParquet.

Maxxen avatar Jul 28 '25 19:07 Maxxen

Oh wow. Very exciting. Thank you for the update.

palmerj avatar Jul 29 '25 20:07 palmerj

Just adding a +1 that it would be great to have the CRS on the column metadata. I want the data stored in its original crs but use st_transform(..., original, 'EPSG:3857') occasionally to visualize it. Currently I'm keeping track of the source_crs somewhere else, which is a bit of additional orchestration.

mhkeller avatar Sep 30 '25 12:09 mhkeller

This is planned to be resolved in DuckDB v1.5

Maxxen avatar Sep 30 '25 12:09 Maxxen

It would be nice if ST_Read_Meta can support reading CRS from geoparquet, in addition to other vector formats like geojson, shapefile, gpkg.

con.sql("""
SELECT CONCAT(layers[1].geometry_fields[1].crs.auth_name, ':', layers[1].geometry_fields[1].crs.auth_code) AS crs_string
FROM ST_Read_Meta('earthquakes.parquet')
""").fetchone()[0]

'EPSG:4326'

giswqs avatar Nov 18 '25 17:11 giswqs