pypowsybl
pypowsybl copied to clipboard
Missing attributes representation
- Do you want to request a feature or report a bug?
Feature.
- What is the current behavior?
Except for float
attributes where missing elements are represented as nan
and string attributes where we use en empty string, we don't have any way to represent absent integer or boolean value.
This becomes very annoying with the addition of IIDM extensions representation, which can be absent from some elements.
- What is the expected behavior?
See discussion here.
The proposition is to use pandas missing data features, although experimental. The pandas IntegerArray and equivalent BooleanArray works with a boolean mask which defines which data are absent. We could also use it for absent strings.
As an illustration, let's imagine a generators dataframe with a "regulation" extension, where G2 would not have that extension:
>>> network.get_generators(attributes=['in_regulation'])
id in_regulation
G1 True
G2 <NA>
In order to implement this, we need to introduce this concept of masked array in our java library, and in the C struct array
.
- What is the motivation / use case for changing the behavior?
Correctly represent missing data in dataframes, in particular IIDM extensions data.
See also discussions on #102