pypowsybl icon indicating copy to clipboard operation
pypowsybl copied to clipboard

Use pandas.Categorical for enum types

Open sylvlecl opened this issue 3 years ago • 1 comments

  • Do you want to request a feature or report a bug?

Feature

  • What is the current behavior?

Enum attributes are represented as strings in the provided dataframes, which is not memory efficient and not very descriptive.

  • What is the expected behavior?

pandas proposes the pd.Categorical type for objects with only a few valid values.

  • What is the motivation / use case for changing the behavior?

Improved memory efficiency and semantics.

About memory efficiency, see on a large network:

>>> network.get_generators().memory_usage(deep=True)
...
energy_source           316067
...

>>> network.get_generators().astype('category').memory_usage(deep=True)
...
energy_source             5652

We can also improve the java -> python performances by transferring "codes" (integers) instead of names, and create the series from the underlying codes:

series = pd.Series(pd.Categorical.from_codes(data, dtype=EnergySource))

sylvlecl avatar Feb 02 '22 09:02 sylvlecl

Note that this will probably require to add some information about the enum type in the series metadata, in order to know which categorical type to use on python side.

sylvlecl avatar Feb 02 '22 11:02 sylvlecl