kedro-plugins
kedro-plugins copied to clipboard
Support parquet and feather datasets from GeoPandas
Description
Two datasets for feather and parquet formats based on GeoPandas, which will make loading and saving GeoDataFrames with geometries easier.
Context
When operating on geospatial data GeoPandas is of great use, however at the moment kedro-datasets plugin only supports GeoJsonDataset. It is possible to load and save parquet and feather files with geometries using standard pandas' datasets, but the geometry data needs special treatment afterwards (e.g. parsing WKB and creating GeoDataFrame manually).
Possible Implementation
Take an implementation of existing geopandas.GeoJsonDataset, then create geopandas.ParquetDataSet and geopandas.FeatherDataSet. I have an already working implementation privately and I can try to add it here
@Calychas Awesome, would you be able to create a PR?
@noklam Yes, I will try to do that sometime in the following 2 weeks
Awesome, thank you @Calychas!
@Calychas Hey! pinning to see if you still have some bandwidth to add support for this :)
Hey @noklam! Sorry for the delay, just came back to this task
Which dataset would you consider a gold standard for implementation to base the new datasets on? When I look over some of the datasets sometimes I see some minor discrepancies - e.g. using BytesIO or not
@Calychas CSVDataSet is probably the most common one.