pypowsybl icon indicating copy to clipboard operation
pypowsybl copied to clipboard

qMax of shunt has not a float type

Open vincentbarbesant opened this issue 3 years ago • 3 comments

  • Do you want to request a feature or report a bug? This is a bug request.

  • What is the current behavior? the qMax attribute of shunts is returned as an object Dtype

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

shunts = network.get_shunt_compensators()
print(shunts.info())
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   model_type                2396 non-null   object 
 1   p                         0 non-null      float64
 2   q                         2396 non-null   float64
 3   voltage_level_id          2396 non-null   object 
 4   bus_id                    2396 non-null   object 
 5   VECTCodeOITransfoDistrib  2396 non-null   object 
 6   qMax                      2396 non-null   object
  • What is the expected behavior? qMax should have of float64 type

  • What is the motivation / use case for changing the behavior? I can't filter or sort the dataframe based on qMax values

  • Please tell us about your environment:

    • PyPowsybl Version : 0.8.0
    • OS Version: Windows 10 Entreprise 64-bit (10.0, Build 18363) (18362.19h1_release.190318-1202)

vincentbarbesant avatar Jul 20 '21 13:07 vincentbarbesant

Some thoughts about that issue:

so the implementation issue here is that the data qmax is format-specific (not part of standard IIDM): that's why it's stored as a generic String property in the network.

When converting from dataframe, for now, we don't try to detect the type of the column, hence the column type.

We can try to detect it when the dataframe is built, but it brings some issues because the properties can be absent from some elements. In that case, for float we could replace it with a NaN, but for bool we have to choose either True or False, and for int we have to choose a "default" value. Note that this will also be an issue with more "typed" data (like IIDM extensions), though, so it's broader than the current issue.

A possibility would be to use pandas nullable integer and boolean types for those columns (see https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html). To build that kind of column, we will need to change the format of the data returned from java, maybe having 2 series for 1 column (1 for data, 1 for masks).

If we don't want to cope with that issue now, we can have a first evolution handling only float columns.

sylvlecl avatar Jul 22 '21 07:07 sylvlecl

Thank you for the precisions.

It seems the function pandas.DataFrame.convert_dtypes() can manage this kind of problem, isn't it ? It tries to convert columns to best possible dtypes, and it supports nullable pd.NA.

vincentbarbesant avatar Jul 22 '21 11:07 vincentbarbesant

Thanks : actually, alone, it does not try to parse strings, it only changes the dtype of the column if the objects are already numbers or booleans. But indeed, in combination with methods like pd.to_numeric it seems to achieve more or less what we need, i.e type conversion + filling empty lines with pd.NA.

There is also a matter of performance though : it's much more efficient to pass numeric arrays from java to python than passing strings (which need to be copied, encoded, need 1 memory allocation for each string, etc). From that point of view, it would probably be better if we can already achieve the conversion on the java side.

sylvlecl avatar Jul 22 '21 12:07 sylvlecl