openPMD-api icon indicating copy to clipboard operation
openPMD-api copied to clipboard

Iterate readIterations/read_iterations Multiple Times

Open ax3l opened this issue 2 years ago • 7 comments

Describe the bug Currently, read_iterations() cannot be looped multiple times. Error message:

openpmd_api.openpmd_api_cxx.ErrorWrongAPIUsage:
  Wrong API usage: Trying to call Series::readIterations() on a (partially) read Series.

This is a bit unusual, since it should start over on the same open series, at least in regular/random access mode (non-streaming.)

To Reproduce Python:

import openpmd_api as io

# ...
series = io.Series(filename, io.Access_Type.read_only)

# start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
    pass

# another start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
    pass

Expected behavior Usually in Python on generators/iterators, one should start iterations over when reading files that way.

Software Environment

  • version of openPMD-api: 0.15.0 & 0.15.1
  • installed openPMD-api via: conda
  • operating system: Linux
  • machine: any
  • name and version of Python implementation: any
  • version of HDF5: hdf5 1.12.2 (nompi_ha7af310_101)
  • version of ADIOS1: N/A
  • version of ADIOS2: N/A
  • name and version of MPI: none

Additional context First seen by @s-sajid-ali.

https://github.com/fnalacceleratormodeling/synergia2/blob/231d3dff97c0a2bb64db49584c626ec15f7b24b4/src/analysis_tools/diag_plot_openpmd.py

ax3l avatar Apr 05 '23 17:04 ax3l

Work-around is to use the traditional API:

series = io.Series(filename, io.Access_Type.read_only)
# ...

for k_i, i in series.iterations.items():
    pass
for k_i, i in series.iterations.items():
    pass

ax3l avatar Apr 05 '23 17:04 ax3l

I would say that this currently has the status of a feature request, rather than a bug ;) If anything, it was a bug that this workflow did somehow function in 0.14. Series::readIterations() is currently intended for workflows that would also be usable in streaming. Doing for it in series.read_iterations() is not a light operation, it goes through the different IO steps in the backend. For lightweight access such as reading attributes, for k_i, i in series.iterations.items() is not a workaround, but the better choice of API. Supporting the workflow of calling read_iterations() multiple times is one of my goals for the 0.16 release cycle, but it will have the character of an API addition and will require new internal workflows and additions in the backend, rather than a quick adaptation in the public API.

franzpoeschel avatar Apr 05 '23 17:04 franzpoeschel

@s-sajid-ali just checking: I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?

ax3l avatar Apr 05 '23 18:04 ax3l

@s-sajid-ali just checking where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)

ax3l avatar Apr 05 '23 18:04 ax3l

.. I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?

I used groupBased encoding:

sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)> h5glance diag.h5                                                                               
diag.h5 (10 attributes)
└data
  ├0 (20 attributes)
  ├1 (20 attributes)
  ├2 (20 attributes)
  ├3 (20 attributes)
  ├4 (20 attributes)
  └5 (20 attributes)

sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)>  

... where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)

Likely from this example: https://openpmd-api.readthedocs.io/en/0.15.1/usage/parallel.html#id2 or from inspecting the available methods for a Series object in a Jupyter notebook and realizing that read_iterations worked for the use case I had (at least with openpmd-api@:0.15.0).

s-sajid-ali avatar Apr 05 '23 19:04 s-sajid-ali

I see, yes the comment

    # In parallel contexts, it's important to explicitly open iterations.
    # This is done automatically when using `Series.write_iterations()`,
    # or in read mode `Series.read_iterations()`.

is misleading, we need to update that.

ax3l avatar Apr 10 '23 18:04 ax3l

#1592 brings a first step in this direction. It supports re-opening closed Iterations and going back to earlier Iterations in Series.readIterations(). This is currently restricted to Series which don't use ADIOS2 steps since those will require closing and reopening in the backend.

franzpoeschel avatar Aug 07 '24 09:08 franzpoeschel