kedro-plugins icon indicating copy to clipboard operation
kedro-plugins copied to clipboard

Support parquet and feather datasets from GeoPandas

Open Calychas opened this issue 2 years ago • 7 comments
trafficstars

Description

Two datasets for feather and parquet formats based on GeoPandas, which will make loading and saving GeoDataFrames with geometries easier.

Context

When operating on geospatial data GeoPandas is of great use, however at the moment kedro-datasets plugin only supports GeoJsonDataset. It is possible to load and save parquet and feather files with geometries using standard pandas' datasets, but the geometry data needs special treatment afterwards (e.g. parsing WKB and creating GeoDataFrame manually).

Possible Implementation

Take an implementation of existing geopandas.GeoJsonDataset, then create geopandas.ParquetDataSet and geopandas.FeatherDataSet. I have an already working implementation privately and I can try to add it here

Calychas avatar Apr 27 '23 10:04 Calychas

@Calychas Awesome, would you be able to create a PR?

noklam avatar Apr 27 '23 10:04 noklam

@noklam Yes, I will try to do that sometime in the following 2 weeks

Calychas avatar Apr 27 '23 10:04 Calychas

Awesome, thank you @Calychas!

SajidAlamQB avatar Apr 27 '23 10:04 SajidAlamQB

@Calychas Hey! pinning to see if you still have some bandwidth to add support for this :)

noklam avatar Jun 29 '23 13:06 noklam

Hey @noklam! Sorry for the delay, just came back to this task

Calychas avatar Jul 04 '23 17:07 Calychas

Which dataset would you consider a gold standard for implementation to base the new datasets on? When I look over some of the datasets sometimes I see some minor discrepancies - e.g. using BytesIO or not

Calychas avatar Jul 04 '23 19:07 Calychas

@Calychas CSVDataSet is probably the most common one.

noklam avatar Jul 04 '23 20:07 noklam