spatialpandas
spatialpandas copied to clipboard
DaskGeoDataFrame parquet write error - Series object has no attribute total_bounds
Hi - I'm running into an error when trying to write a DaskGeoDataFrame
. I'm following the basic pattern here (see also) but using a smaller sample of a point dataset. Everything seems to run as expected until trying to write out the packed file and I encounter the error below.
ALL software version info
pyarrow =15.0.0 spatialpandas=0.4.10 pandas=2.1.1 dask=2024.2.0 python=3.9.16
df = df.pack_partitions(npartitions=df.npartitions, shuffle='disk')
df.to_parquet(save_path)
I was able to get a small file written without error but I still encounter the error with a large dataset.
I re-ran on a different system with pandas 2.2.1 and again with pandas 1.5.3 and encountered the error each time. Any ideas are appreciated. Here is a more complete stack trace
If there is only one Dataframe partition saving works fine - if there is > 1 partition, this error is returned.
I would guess that this was implemented with fastparquet
, which has now been dropped by Dask. Can you try downgrading the Dask version to something like 2020 and see if that will work with/without fastparquet
.
Thanks for that idea @Hoxbro. I downgraded dask to 2020 but it returns the same error.
So far in looking into the issue I found that any call to df.geometry.total_bounds
after df.pack_partitions()
raises the error. However, you can call the total_bounds property any number of times before packing partitions and it returns correctly.
Did you try to set the parquet backend to fastparquet?
I did try fastparquet (same error). However, I don't think it's related to that or to saving directly. Something happens with pack_partitions that causes and future calls to the geometry.total_bounds property to fail. It's failing at save because to_parquet makes calls to that property.
I found a trigger condition for the error - it occurs when one or more longitudes are negative. I attached a simple notebook that reproduces the error. If you change the negative longitude to positive the error is resolved. Not sure where to look in the code to patch this. Thanks! sp_error_example.ipynb.txt