pypowsybl
pypowsybl copied to clipboard
qMax of shunt has not a float type
-
Do you want to request a feature or report a bug? This is a bug request.
-
What is the current behavior? the
qMax
attribute of shunts is returned as anobject
Dtype -
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
shunts = network.get_shunt_compensators()
print(shunts.info())
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 model_type 2396 non-null object
1 p 0 non-null float64
2 q 2396 non-null float64
3 voltage_level_id 2396 non-null object
4 bus_id 2396 non-null object
5 VECTCodeOITransfoDistrib 2396 non-null object
6 qMax 2396 non-null object
-
What is the expected behavior?
qMax
should have offloat64
type -
What is the motivation / use case for changing the behavior? I can't filter or sort the dataframe based on
qMax
values -
Please tell us about your environment:
- PyPowsybl Version : 0.8.0
- OS Version: Windows 10 Entreprise 64-bit (10.0, Build 18363) (18362.19h1_release.190318-1202)
Some thoughts about that issue:
so the implementation issue here is that the data qmax
is format-specific (not part of standard IIDM):
that's why it's stored as a generic String
property in the network.
When converting from dataframe, for now, we don't try to detect the type of the column, hence the column type.
We can try to detect it when the dataframe is built, but it brings some issues because the properties can be absent from some elements. In that case, for float
we could replace it with a NaN
, but for bool
we have to choose either True
or False
, and for int
we have to choose a "default" value.
Note that this will also be an issue with more "typed" data (like IIDM extensions), though, so it's broader than the current issue.
A possibility would be to use pandas nullable integer and boolean types for those columns (see https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). To build that kind of column, we will need to change the format of the data returned from java, maybe having 2 series for 1 column (1 for data, 1 for masks).
If we don't want to cope with that issue now, we can have a first evolution handling only float
columns.
Thank you for the precisions.
It seems the function pandas.DataFrame.convert_dtypes() can manage this kind of problem, isn't it ? It tries to convert columns to best possible dtypes, and it supports nullable pd.NA.
Thanks : actually, alone, it does not try to parse strings, it only changes the dtype of the column if the objects are already numbers or booleans. But indeed, in combination with methods like pd.to_numeric
it seems to achieve more or less what we need, i.e type conversion + filling empty lines with pd.NA.
There is also a matter of performance though : it's much more efficient to pass numeric arrays from java to python than passing strings (which need to be copied, encoded, need 1 memory allocation for each string, etc). From that point of view, it would probably be better if we can already achieve the conversion on the java side.