pypowsybl Network loading API: support for binary streams

Do you want to request a feature or report a bug?

Feature.

What is the current behavior?

We have 2 loading methods:

load: takes a file path as argument
load_from_string: takes the content of a file, as a string, as argument. Does not support byte entries.

So, in particular, in-memory "blobs" cannot be provided as arguments. Streaming content is not supported either.

What is the expected behavior?

As proposed in #144, we should support byte streams (file object) as arguments to load.

Type checking and runtime behaviour After some digging, it seems there is not a very standard way of type cheking for file objects, neither at runtime nor at typecheck time. We have typing.BinaryIO and various classes in io module. But io.BinaryIO exposes much more methods than necessary: we could go for a lighter protocol. As an example, pandas lib seems to perform typechecking by using a union of many types (see comment below), and at runtime will only check for the presence of a read method.

--> a good, mixed approach could be to define a simple protocol with at least read method, and check for their presence at runtime.

String input handling

We should deprecate the load_from_string method: users can provide an in memory buffer instead, with io.BytesIO for example.

Once question remains: do we allow only to input binary IO (such as io.BytesIO), or also text IO (such as io.StringIO) ? The latter could be handy for text formats. If we want to distinguish between the 2, there is no standard way ... For example, pandas ends up looking for 'b' character in "mode", and also checking the actual class of the object (against a predefined set of class including io.TextIOWrapper etc).

What is the motivation / use case for changing the behavior?

A use case is being able to load a network from in memory zip content.

May 23 '22 13:05 sylvlecl

For reference, pandas uses those unions for typing file inputs:

# filenames and file-like-objects
Buffer = Union[IO[AnyStr], RawIOBase, BufferedIOBase, TextIOBase, TextIOWrapper, mmap]
FileOrBuffer = Union[str, Buffer[AnyStr]]
FilePathOrBuffer = Union["PathLike[str]", FileOrBuffer[AnyStr]]

May 24 '22 09:05 sylvlecl

I would propose to support only io.BytestIO
The same should be done for export
On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging

Network.load_bytest(list(object1, object2, objectN)

where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside

May 31 '22 11:05 Haigutus

any updates on this?

Apr 14 '23 11:04 Haigutus

It seems there should be change done already on JAVA side on things

Currenlty: filePath -> Network.load -> DataSource -> Imporetr find -> Import

Proposal to change Network.load implementation:

if filePath: use current solution
if not filePath and DataSource DataSource -> Network.load -> Importer find -> Import

Then on PY side of things one could implement creation of DataSource from file_like objects so either DataSourece(open(filePath)) or DataSource(io.BytesIO)

This would help integrations in any language as it would remove main filesystem dependacy form the source code

https://github.com/powsybl/powsybl-core/blob/8b2850d803738d471c76b1f6ac7c8903af70e3db/commons/src/main/java/com/powsybl/commons/datasource/DataSourceUtil.java#L65

Apr 30 '23 08:04 Haigutus

An easy solution could be to fully load the byte stream on python side and then transfer as it is to Java side but it would not be memory efficient. The right way (and hardest!) is to somehow connect python BytesIO.read to Java InputStream.read to be able to really stream the bytes from end to end.

May 02 '23 18:05 geofjamg

I would propose to support only io.BytestIO

The same should be done for export

On the packaging of files I would propose that the function would take in list of binary objects and find all XML-s independent of packaging

Network.load_bytest(list(object1, object2, objectN)

where object could be a XML file, Zip file with single XML inside, Zip file with multiple XML inside

If we want to support this, we have to first detect if the byte stream is a zip or not (using magic number) and then we will be able to map it to the right DataSource implementation on Java side.

May 02 '23 18:05 geofjamg

Maybe the GraalVM api helps here with https://www.graalvm.org/sdk/javadoc/index.html?org/graalvm/nativeimage/c/type/CTypeConversion.html

May 03 '23 09:05 Haigutus

FYI following discussions with @geofjamg I am working to support io.BytesIO as an input to load a network. This mean the content have to be fully loaded in memory but it should work with ascii based network description and also zip file as a memory blob (require a pull request on powsybl core to add a InMemoryZipFileDataSource as the existing ZipFileDataSource reload from a file). I'll try to investigate a completely streamed pipeline but it will probably be an issue with zip support.

May 25 '23 14:05 obrix

@Haigutus last release 0.24 supports BytesIO for network loading. You can provides to load_from_binary_buffers a list of BytesIO containing for instance individually zipped profiles and boundaries. Let us known if it solves your issue.

Jul 26 '23 09:07 geofjamg

@geofjamg I can confirm, the new API works for import, we have been using/testing for a week and have not encountered any issues, thankyou. As for export, should we make another ticket or will that also be handled under this ticket?

Aug 03 '23 13:08 Haigutus

We can let this issue open until export is done

Aug 03 '23 14:08 geofjamg

@geofjamg do you have any information, when the export to binary buffers could be expected?

Aug 31 '23 20:08 Haigutus

Both import and export are implemented, thanks. This issue can be closed

Feb 23 '24 07:02 Haigutus

pypowsybl pypowsybl copied to clipboard

Network loading API: support for binary streams

pypowsybl
pypowsybl copied to clipboard