sedona
sedona copied to clipboard
`dataframe_to_arrow` Returns a table that doesn't convert geopandas index correctly
A lot of text below, but I'll highlight the main difference first. Notice our version has extra nested [ ].
# Our dataframe_to_arrow returns the following column
geometry: [[0101...F03F],[0101...0040]]
# But geopandas returns this.
geometry: [[0101...F03F,0101...0040]]
This happens for the index column (__index_level_0__) too, which leads to it being misterpreted as a column instead of being read in as an index when calling gpd.GeoDataFrame.from_arrow()
# Sedona returns
__index_level_0__ geometry
0 1 POINT (1 1)
1 2 POINT (2 2)
# Geopandas returns this
geometry
1 POINT (1 1)
2 POINT (2 2)
Full script and output below.
import geopandas as gpd
import sedona.geopandas as sgpd
from sedona.spark.geoarrow.geoarrow import dataframe_to_arrow
sgpd_df = sgpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]}, index=pd.Index([1, 2]))
spark_df = sgpd_df._internal.spark_frame.drop("__natural_order__") # don't worry about this drop
sgpd_arrow = dataframe_to_arrow(spark_df)
gpd_df = gpd.GeoDataFrame({"geometry": [Point(1, 1), Point(2, 2)]}, index=pd.Index([1, 2]))
gpd_arrow = pa.table(gpd_df.to_arrow())
assert type(sgpd_arrow) == type(gpd_arrow) == pa.Table
print("SEDONA\n", sgpd_arrow, "\n")
gpd_df_from_sgpd_arrow = gpd.GeoDataFrame.from_arrow(sgpd_arrow)
print(gpd_df_from_sgpd_arrow, "\n")
print("GEOPANDAS\n", gpd_arrow, "\n")
gpd_df_from_gpd_arrow = gpd.GeoDataFrame.from_arrow(gpd_arrow)
print(gpd_df_from_gpd_arrow)
SEDONA
pyarrow.Table
__index_level_0__: int64
geometry: extension<geoarrow.wkb<WkbType>>
----
__index_level_0__: [[1],[2]]
geometry: [[0101000000000000000000F03F000000000000F03F],[010100000000000000000000400000000000000040]]
__index_level_0__ geometry
0 1 POINT (1 1)
1 2 POINT (2 2)
GEOPANDAS
pyarrow.Table
geometry: extension<geoarrow.wkb<WkbType>>
__index_level_0__: int64
----
geometry: [[0101000000000000000000F03F000000000000F03F,010100000000000000000000400000000000000040]]
__index_level_0__: [[1,2]]
geometry
1 POINT (1 1)
2 POINT (2 2)