pypowsybl icon indicating copy to clipboard operation
pypowsybl copied to clipboard

Missing attributes representation

Open sylvlecl opened this issue 3 years ago • 1 comments

  • Do you want to request a feature or report a bug?

Feature.

  • What is the current behavior?

Except for float attributes where missing elements are represented as nan and string attributes where we use en empty string, we don't have any way to represent absent integer or boolean value.

This becomes very annoying with the addition of IIDM extensions representation, which can be absent from some elements.

  • What is the expected behavior?

See discussion here.

The proposition is to use pandas missing data features, although experimental. The pandas IntegerArray and equivalent BooleanArray works with a boolean mask which defines which data are absent. We could also use it for absent strings.

As an illustration, let's imagine a generators dataframe with a "regulation" extension, where G2 would not have that extension:

>>> network.get_generators(attributes=['in_regulation'])
id   in_regulation
G1   True
G2   <NA>

In order to implement this, we need to introduce this concept of masked array in our java library, and in the C struct array.

  • What is the motivation / use case for changing the behavior?

Correctly represent missing data in dataframes, in particular IIDM extensions data.

sylvlecl avatar Nov 10 '21 16:11 sylvlecl

See also discussions on #102

sylvlecl avatar Feb 02 '22 13:02 sylvlecl