pypowsybl icon indicating copy to clipboard operation
pypowsybl copied to clipboard

overcoming some python strict typing problems

Open PatrickBourdon opened this issue 3 years ago • 5 comments

  • Do you want to request a feature or report a bug? Neither a bug nor a feature for a functional request, typing issues.

  • Please tell us about your environment:

    • Platform: Windows-10-10.0.19042-SP0
    • Python: 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)] on win32
    • pypowsybl: 0.12.0
    • Dev environment: VSCode 1.63.2, Pylance v2022.1.3
  • typing: I use and recommend the use of python strict type checking, for it prevents many bugs and adds autodocumentation. I faced some typing issues using pypowsybl. To overcome them I created a local ElementType(Enum) class and a pn.Network subclass. Hereafter they are and the explaination follows:

from enum import Enum, auto # https://docs.python.org/3/library/enum.html
import gzip # https://docs.python.org/3/library/gzip.html
from os.path import abspath # https://docs.python.org/3/library/os.path.html
from pathlib import Path # https://docs.python.org/3/library/pathlib.html
from typing import Union, Any # https://docs.python.org/3/library/typing.html

from pandas import DataFrame # https://pandas.pydata.org/docs/reference/frame.html

import _pypowsybl # (non recommended hack) https://www.powsybl.org/pages/documentation/grid/model/
import pypowsybl.network as pn # https://pypowsybl.readthedocs.io/en/stable/reference/network.html

class ElementType(Enum):
    BATTERY = auto()
    BUS = auto()
    BUSBAR_SECTION = auto()
    CURRENT_LIMITS = auto()
    DANGLING_LINE = auto()
    GENERATOR = auto()
    HVDC_LINE = auto()
    LCC_CONVERTER_STATION = auto()
    LINE = auto()
    # ... and all others ...

class Network(pn.Network):

    def __init__(self, path: Path, parameters: Union[None, dict[str, Any]]=None) -> None:
        """ 
        Loads a pypowsybl network from a file:
        - path (str): the path of the file to load
        - parameters (dict, optional): a map of parameters

        NOTE 1: The format of the network description is discovered from the file extension (e.g. .xiidm)
        NOTE 2: If the file is gzip-compressed (e.g. .xiidm.gz), the network is loaded without an intermediate decompressed file.
        """
        print(f'Loading network from path({path}) ...')
        if path.suffix.lower() == '.gz':
            with gzip.open(abspath(path), 'rt', encoding='utf_8_sig') as text_io:
                super().__init__(_pypowsybl.load_network_from_string(path.stem, text_io.read(), parameters or {})) # type: ignore
                # '.gz' extension has to be removed, otherwise the network format isn't recognized 
                # and the following exception is raised: 
                #   _pypowsybl.PyPowsyblError: Unsupported file format or invalid file.
        else:
            super().__init__(_pypowsybl.load_network(abspath(path), parameters or {})) # type: ignore
        print(str(self))
        return

    def get_elements(self, element_type: ElementType) -> DataFrame:
        """
        Overloads and calls the same method of the superclass pn.Network.
        The difference is the use of the locally defined enumerated 'ElementType'.
        """
        pn_element_type = next((x for x in pn.ElementType.__members__.values() if x.name == element_type.name)) # type: ignore
        return super().get_elements(pn_element_type) # type: ignore

    def elements_count(self) -> DataFrame:
        """ 
        Returns a dataframe with the elements count of each ElementType.
        """
        df = DataFrame({'count': [len(self.get_elements(x)) for x in ElementType.__members__.values()] },
                      index=list(ElementType.__members__))
        df.index.name = 'ElementType'
        return df

def main() -> None:
    network: Network = Network(Path('20220101T0000Z_20220101T0000Z_pf.xiidm.gz'))
    print(network.elements_count())
    lines: DataFrame = network.get_elements(ElementType.LINE)
    print(lines)
    return

@main()

1/ Why ElementType(Enum)? pn.ElementType or pn.ElementType.BUS are both accessible respectively as <class '_pypowsybl.ElementType'> and <ElementType.BUS: 0>. However the definition of the latter is not accessible to the type checker and an error is reported when using any element type such as: df: DataFrame = network.get_elements(pn.ElementType.BUS)

2/ Why pn.Network subclass? To overload get_elements using the ElementType(Enum) without typing errors, whilst still using any other pypowsybl.network methods. The new get_elements wraps (once for all) the typing errors, hidden by # type: ignore. elements_count is a basic example using get_elements: it reports the size of the loaded network.

Note: get_elements is NOT described in the documentation but it is a public method. update_elements also has a pn.ElementType parameter and it is part of the documentation.

3/ About __init__: The method of the created subclass adds a feature: direct read of a .xiidm.gz. The method wraps and hides the typing errors from load_network_from_string and load_network.

The typing errors come from their signatures: parameters: dict = {}. dict alone misses some information. It should probably be something like dict[str, Any].

Note that Using a dict as a default parameter is not recommended: but this is beyond typing. https://docs.quantifiedcode.com/python-anti-patterns/correctness/mutable_default_value_as_argument.html pylint notifies this as an error in strict mode. The recommended way is probably parameters: Union[None, dict[str, Any]]=None (python 3.9) parameters: None | dict[str, Any] = None (python 3.10)

4/ The file network.py includes the class pn.Network and also reads:

from typing import List as _List # (1)
from typing import Set as _Set # (2)
from pandas import DataFrame as _DataFrame # (3)

Use oftypint.List (1) et typint.Set (3) are deprecated since 3.9: they are now just list and set that nobody wants to make private (I am unsure if the new way to use them is available from python 3.7). https://docs.python.org/3/library/typing.html#typing.List https://docs.python.org/3/library/typing.html#typing.Set

From the typing point-of-view, the types of parameters and result of any public method MUST be public. Otherwise the caller cannot type his variables, and a type checker in strict mode complains. A signature of a public method like def my_method(self) -> _DataFrame: is strange by essence, for it says that the result is a private type. However the good news is that the type checker understands:

from pandas import DataFrame
df: DataFrame = network.my_method()

since imported DataFrame == np._DataFrame. Therefore (3) only prevents from writing from np import DataFrame, which would be anyway harmless.

Why do you make private systematically every imported object, such as : import datetime as _datetime?

PatrickBourdon avatar Jan 26 '22 18:01 PatrickBourdon

Thanks for the detailed analysis!

Just a quick answer to this part:

Why do you make private systematically every imported object, such as : import datetime as _datetime?

The purpose is to have clean namespaces: for example if we import datetime in pypowsybl.network, it will appear as being part of our own API, whereas it is not (you will never want to do from pypowsybl.network import datetime, and you don't want your IDE to propose it with autocompletion, or it will be messy to use).

For now, we try to keep our namespaces tidy by importing other APIs with "private" scope. Maybe a more powerful solution, which seems broadly used, would be to move our implementations into inner modules, and build our public namespaces by importing in them only those pieces that we want to expose.

For example we could:

  • Define Network in pypowsybl.network.core with any import with any scope
  • In pypowsybl.network.__init__: import Network class

This should enable us to keep namespaces tidy, while not making imports private.

sylvlecl avatar Jan 27 '22 16:01 sylvlecl

The work in #306 and #307 should solve some of the typing problems you reported:

  • we will now provide stubs for _pypowsybl
  • the integration process now includes a verification with mypy static checker, that all methods have type annotations, and that there is no typing problem. Note though, that it might behave slightly differently from the pyright static type check, which I think is included in pylance. In particular, for now, arguments are implicitly considered optional when their default value is None. This may be improved in a further evolution.

Those improvements will be included in next release.

sylvlecl avatar Feb 08 '22 09:02 sylvlecl

👍

PatrickBourdon avatar Feb 08 '22 17:02 PatrickBourdon

Hi @PatrickBourdon

Did you have any opportunity to check if typing problems have indeed been solved in the last releases of pypowsybl, or if some remain ?

sylvlecl avatar May 11 '22 14:05 sylvlecl

Fixed additional missing types for __members__, following your mail feedback: #396

sylvlecl avatar May 12 '22 08:05 sylvlecl