openPMD-api
openPMD-api copied to clipboard
Iterate readIterations/read_iterations Multiple Times
Describe the bug
Currently, read_iterations() cannot be looped multiple times.
Error message:
openpmd_api.openpmd_api_cxx.ErrorWrongAPIUsage:
Wrong API usage: Trying to call Series::readIterations() on a (partially) read Series.
This is a bit unusual, since it should start over on the same open series, at least in regular/random access mode (non-streaming.)
To Reproduce Python:
import openpmd_api as io
# ...
series = io.Series(filename, io.Access_Type.read_only)
# start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
pass
# another start-to-end read:
for count, iteration in enumerate(series.read_iterations()):
pass
Expected behavior Usually in Python on generators/iterators, one should start iterations over when reading files that way.
Software Environment
- version of openPMD-api: 0.15.0 & 0.15.1
- installed openPMD-api via: conda
- operating system: Linux
- machine: any
- name and version of Python implementation: any
- version of HDF5: hdf5 1.12.2 (nompi_ha7af310_101)
- version of ADIOS1: N/A
- version of ADIOS2: N/A
- name and version of MPI: none
Additional context First seen by @s-sajid-ali.
https://github.com/fnalacceleratormodeling/synergia2/blob/231d3dff97c0a2bb64db49584c626ec15f7b24b4/src/analysis_tools/diag_plot_openpmd.py
Work-around is to use the traditional API:
series = io.Series(filename, io.Access_Type.read_only)
# ...
for k_i, i in series.iterations.items():
pass
for k_i, i in series.iterations.items():
pass
I would say that this currently has the status of a feature request, rather than a bug ;) If anything, it was a bug that this workflow did somehow function in 0.14.
Series::readIterations() is currently intended for workflows that would also be usable in streaming.
Doing for it in series.read_iterations() is not a light operation, it goes through the different IO steps in the backend.
For lightweight access such as reading attributes, for k_i, i in series.iterations.items() is not a workaround, but the better choice of API.
Supporting the workflow of calling read_iterations() multiple times is one of my goals for the 0.16 release cycle, but it will have the character of an API addition and will require new internal workflows and additions in the backend, rather than a quick adaptation in the public API.
@s-sajid-ali just checking: I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?
@s-sajid-ali just checking where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)
.. I remember you wrote the file with HDF5. When you wrote the file, did you use for iterations the groupBased encoding, the fileBased encoding?
I used groupBased encoding:
sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)> h5glance diag.h5
diag.h5 (10 attributes)
└data
├0 (20 attributes)
├1 (20 attributes)
├2 (20 attributes)
├3 (20 attributes)
├4 (20 attributes)
└5 (20 attributes)
sasyed@MAC-140753 ~/D/p/s/b/e/fodo_cxx (sajid/openpmd_python_api_fixes)>
... where you found read_iterations in the docs/examples - we just want to make sure we don't accidentally advertise it outside of streaming (read-once) workflows yet :)
Likely from this example: https://openpmd-api.readthedocs.io/en/0.15.1/usage/parallel.html#id2 or from inspecting the available methods for a Series object in a Jupyter notebook and realizing that read_iterations worked for the use case I had (at least with openpmd-api@:0.15.0).
I see, yes the comment
# In parallel contexts, it's important to explicitly open iterations.
# This is done automatically when using `Series.write_iterations()`,
# or in read mode `Series.read_iterations()`.
is misleading, we need to update that.
#1592 brings a first step in this direction. It supports re-opening closed Iterations and going back to earlier Iterations in Series.readIterations(). This is currently restricted to Series which don't use ADIOS2 steps since those will require closing and reopening in the backend.