dask-geopandas icon indicating copy to clipboard operation
dask-geopandas copied to clipboard

BUG: failure if manually specifying engine="pyarrow" in to_parquet

Open jorisvandenbossche opened this issue 3 years ago • 1 comments

I just noticed that when the argument engine="pyarrow" is provided to to_parquet() the write still fails with the same error.

import pandas as pd
import geopandas as gpd
import dask_geopandas as dgpd

dft = pd.util.testing.makeDataFrame()
dft["geometry"] = gpd.points_from_xy(dft.A, dft.B)
df = gpd.GeoDataFrame(dft)
df = dgpd.from_geopandas(df, npartitions=1)
df.to_parquet("mydf.parquet", engine="pyarrow")

Originally posted by @FlorisCalkoen in https://github.com/geopandas/dask-geopandas/issues/198#issuecomment-1213201353

jorisvandenbossche avatar Aug 12 '22 18:08 jorisvandenbossche

Ah, that is "expected", because you are then using dask's built-in "pyarrow" engine, and we actually extend that engine to handle the geometry dtype properly.

But of course, we should avoid that people can accidentally pass engine="pyarrow" and thus silently overwriting our own engine. Seems we need something more elaborate that the simple partial to do that:

https://github.com/geopandas/dask-geopandas/blob/2fd1646ab4f3c0f415802151abd6e5cb5f1e0155/dask_geopandas/io/parquet.py#L97-L98

jorisvandenbossche avatar Aug 12 '22 18:08 jorisvandenbossche