pypowsybl
pypowsybl copied to clipboard
Use pandas.Categorical for enum types
- Do you want to request a feature or report a bug?
Feature
- What is the current behavior?
Enum attributes are represented as strings in the provided dataframes, which is not memory efficient and not very descriptive.
- What is the expected behavior?
pandas proposes the pd.Categorical
type for objects with only a few valid values.
- What is the motivation / use case for changing the behavior?
Improved memory efficiency and semantics.
About memory efficiency, see on a large network:
>>> network.get_generators().memory_usage(deep=True)
...
energy_source 316067
...
>>> network.get_generators().astype('category').memory_usage(deep=True)
...
energy_source 5652
We can also improve the java -> python performances by transferring "codes" (integers) instead of names, and create the series from the underlying codes:
series = pd.Series(pd.Categorical.from_codes(data, dtype=EnergySource))
Note that this will probably require to add some information about the enum type in the series metadata, in order to know which categorical type to use on python side.