Support SRS for geoparquet output
Hi would it be possible to support setting the SRS output for a parquet file so the JSON metadata->geo->crs key is populated?
I see this comment that looks promising and would provide a nice way to do it:
COPY (
SELECT ST_SetSRID(ST_Point(1434967.0, 15026561.0), 'epsg:2193')
) TO 'geotest.parquet'
WITH (
FORMAT 'PARQUET',
COMPRESSION zstd,
COMPRESSION_LEVEL 9,
ROW_GROUP_SIZE 500
);
or maybe an SRS explicit option:
COPY (
SELECT ST_Point(1434967.0, 15026561.0)
) TO 'geotest.parquet'
WITH (
FORMAT 'PARQUET',
COMPRESSION zstd,
COMPRESSION_LEVEL 9,
ROW_GROUP_SIZE 500,
SRS 'EPSG:2193',
);
Another option, which might be easier to implement, is to compile in support for the GDAL geoparquet driver. That driver already supports setting the CRS key in the JSON metadata of the parquet file.
COPY (
SELECT ST_SetSRID(ST_Point(1434967.0, 15026561.0), 'epsg:2193')
) TO 'geotest.parquet'
WITH (
FORMAT gdal,
DRIVER 'parquet',
LAYER_CREATION_OPTIONS ('WRITE_COVERING_BBOX=YES' , 'COMPRESSION=ZSTD')
SRS 'EPSG:2193',
);
Many thanks!
@Maxxen sorry to ask, do you have any view on this, or is it planned to be fixed in the roadmap?
Hello! Im currently overhauling how types work in general in DuckDB, and how the extension types and DuckDB GEOMETRY type works in particular, with the goal of being able to attach extra metadata (such as CRS) to a column, which will enable passing SRS/CRS information through and from external formats automatically. As soon as that is done I'll make sure it works for GeoParquet.
Oh wow. Very exciting. Thank you for the update.
Just adding a +1 that it would be great to have the CRS on the column metadata. I want the data stored in its original crs but use st_transform(..., original, 'EPSG:3857') occasionally to visualize it. Currently I'm keeping track of the source_crs somewhere else, which is a bit of additional orchestration.
This is planned to be resolved in DuckDB v1.5
It would be nice if ST_Read_Meta can support reading CRS from geoparquet, in addition to other vector formats like geojson, shapefile, gpkg.
con.sql("""
SELECT CONCAT(layers[1].geometry_fields[1].crs.auth_name, ':', layers[1].geometry_fields[1].crs.auth_code) AS crs_string
FROM ST_Read_Meta('earthquakes.parquet')
""").fetchone()[0]
'EPSG:4326'