pypowsybl
pypowsybl copied to clipboard
Network loading API: support for binary streams
- Do you want to request a feature or report a bug?
Feature.
- What is the current behavior?
We have 2 loading methods:
load: takes a file path as argumentload_from_string: takes the content of a file, as a string, as argument. Does not support byte entries.
So, in particular, in-memory "blobs" cannot be provided as arguments. Streaming content is not supported either.
- What is the expected behavior?
As proposed in #144, we should support byte streams (file object) as arguments to load.
Type checking and runtime behaviour
After some digging, it seems there is not a very standard way of type cheking for file objects, neither at runtime nor at typecheck time.
We have typing.BinaryIO and various classes in io module. But io.BinaryIO exposes much more methods than necessary:
we could go for a lighter protocol.
As an example, pandas lib seems to perform typechecking by using a union of many types (see comment below), and at runtime will only check for the presence of a read method.
--> a good, mixed approach could be to define a simple protocol with at least read method, and check for their presence at runtime.
String input handling
We should deprecate the load_from_string method:
users can provide an in memory buffer instead, with io.BytesIO for example.
Once question remains: do we allow only to input binary IO (such as io.BytesIO), or also text IO (such as io.StringIO) ?
The latter could be handy for text formats.
If we want to distinguish between the 2, there is no standard way ... For example, pandas ends up looking for 'b' character in "mode", and also checking the actual class of the object (against a predefined set of class including io.TextIOWrapper etc).
- What is the motivation / use case for changing the behavior?
A use case is being able to load a network from in memory zip content.
For reference, pandas uses those unions for typing file inputs:
# filenames and file-like-objects
Buffer = Union[IO[AnyStr], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper, mmap]
FileOrBuffer = Union[str, Buffer[AnyStr]]
FilePathOrBuffer = Union["PathLike[str]", FileOrBuffer[AnyStr]]
- I would propose to support only io.BytestIO
- The same should be done for export
- On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging
Network.load_bytest(list(object1, object2, objectN)
where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside
any updates on this?
It seems there should be change done already on JAVA side on things
Currenlty: filePath -> Network.load -> DataSource -> Imporetr find -> Import
Proposal to change Network.load implementation:
- if filePath: use current solution
- if not filePath and DataSource DataSource -> Network.load -> Importer find -> Import
Then on PY side of things one could implement creation of DataSource from file_like objects so either DataSourece(open(filePath)) or DataSource(io.BytesIO)
This would help integrations in any language as it would remove main filesystem dependacy form the source code
https://github.com/powsybl/powsybl-core/blob/8b2850d803738d471c76b1f6ac7c8903af70e3db/commons/src/main/java/com/powsybl/commons/datasource/DataSourceUtil.java#L65
An easy solution could be to fully load the byte stream on python side and then transfer as it is to Java side but it would not be memory efficient. The right way (and hardest!) is to somehow connect python BytesIO.read to Java InputStream.read to be able to really stream the bytes from end to end.
- I would propose to support only io.BytestIO
- The same should be done for export
- On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging
Network.load_bytest(list(object1, object2, objectN)
where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside
If we want to support this, we have to first detect if the byte stream is a zip or not (using magic number) and then we will be able to map it to the right DataSource implementation on Java side.
Maybe the GraalVM api helps here with https://www.graalvm.org/sdk/javadoc/index.html?org/graalvm/nativeimage/c/type/CTypeConversion.html

FYI following discussions with @geofjamg I am working to support io.BytesIO as an input to load a network. This mean the content have to be fully loaded in memory but it should work with ascii based network description and also zip file as a memory blob (require a pull request on powsybl core to add a InMemoryZipFileDataSource as the existing ZipFileDataSource reload from a file). I'll try to investigate a completely streamed pipeline but it will probably be an issue with zip support.
@Haigutus last release 0.24 supports BytesIO for network loading. You can provides to load_from_binary_buffers a list of BytesIO containing for instance individually zipped profiles and boundaries. Let us known if it solves your issue.
@geofjamg I can confirm, the new API works for import, we have been using/testing for a week and have not encountered any issues, thankyou. As for export, should we make another ticket or will that also be handled under this ticket?
We can let this issue open until export is done
@geofjamg do you have any information, when the export to binary buffers could be expected?
Both import and export are implemented, thanks. This issue can be closed