pandapower
pandapower copied to clipboard
[bug] Serialization of shapely objects in dataframes creates "intermediate" products
Bug report checklis
-
[X] Searched the issues page for similar reports
-
[X] Read the relevant sections of the documentation
-
[X] Browse the tutorials and tests for usefull code snippets and examples of use
-
[X] Reproduced the issue after updating with
pip install --upgrade pandapower
(orgit pull
) -
[X] Tried basic troubleshooting (if a bug/error) like restarting the interpreter and checking the pythonpath
Reproducible Example
import pandas as pd
import pandapower as pp
import shapely
df = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])]})
json_str = pp.to_json(df)
df2 = pp.from_json_string(json_str)
print(df)
print(df2)
import geopandas as gpd
df2 = pd.DataFrame({"a": [1, 2], "b": [shapely.Point([1, 4]), shapely.LineString([[1, 2], [4, 6]])], "c": [shapely.Point([1, 9]), shapely.LineString([[1, 2], [4, 4]])]})
gdf = gpd.GeoDataFrame(df2, geometry="c")
json_str_gdf = pp.to_json(gdf)
gdf2 = pp.from_json_string(json_str)
Issue Description and Traceback
When running the above code, the shapely data is transferred into the internal pandapower serialization format. Upon deserialization, this format cannot be converted back, but is kept as a dict with multiple "useless" entries, such as "_module" or "_class". I assume that the reason behind this is that we pass the pandapower to_serializable handler as default_handler to pandas upon serialization, but we can't hand over a registry or decode-hook upon de-serialization. Is that correct? Do you have any idea of how to overcome this problem?
I know that serializing a dataframe is not a good usecase for the pandapower.to_json
function, but in some cases, I do store shapely data inside my net dataframes without making them geopandas dataframes. Additionally, I sometimes use more than just one column with geodata. For such cases, I added the geopandas part of the code. It is completely impossibly to encode GeoDataFrames that contain more geodata than just that inside the "geometry" column, as the following error occurs:
Traceback (most recent call last):
File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 448, in default
s = to_serializable(o)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].__class__)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 985, in json_geodataframe
d = with_signature(obj, obj.to_json())
^^^^^^^^^^^^^
File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/geopandas/geodataframe.py", line 782, in to_json
return json.dumps(
^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Point is not JSON serializable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/daniel/.virtualenvs/retoflow/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-24-0801906cf0dd>", line 1, in <module>
gdf_str = pp.to_json({"geo": gdf})
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/workspace/pandapower/pandapower/file_io.py", line 132, in to_json
json_string = json.dumps(net, cls=io_utils.PPJSONEncoder, indent=2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 202, in encode
chunks = list(chunks)
^^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 432, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.11/json/encoder.py", line 439, in _iterencode
o = _default(o)
^^^^^^^^^^^
File "/home/daniel/workspace/pandapower/pandapower/io_utils.py", line 451, in default
return json.JSONEncoder.default(self, o)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type GeoDataFrame is not JSON serializable
Expected Behavior
It would be great to retrieve shapely data even from withtin dataframes or geodataframes outside the geometry column. Any ideas on that?
Installed Versions
INSTALLED VERSIONS
commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.11.9.final.0 python-bits : 64 OS : Linux OS-release : 6.5.0-35-generic Version : #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8 pandas : 1.5.3 numpy : 1.23.5 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 70.0.0 pip : 24.0 Cython : 3.0.9 pytest : 8.1.1 hypothesis : 6.82.7 sphinx : None blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.9 jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader: None bs4 : 4.12.3 bottleneck : None brotli : 1.1.0 fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.3 numba : 0.59.1 numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.12.0 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None tzdata : 2024.1
Label
- [X] Relevant labels are selected