openPMD-standard
openPMD-standard copied to clipboard
Proposed: gridDataOrder attribute
Situation: For data storage with HDF5, the data of a grid (multidimensional array) is stored in the data file in same order as it is stored in memory. That is, with fortran, the data will be stored in column major order and with C/C++ the data will be stored in row major order. Thus, the standard needs to address how handle this.
Proposed: For the Beam Physics extension (at least). To have a gridDataOrder
attribute that may be set to C
or F
to reflect if field grid data is being stored in row major (C
) or column major (F
) order. Grid meta data, for example, the three numbers that give the distance between grid points in x, y, and z, are to be stored in the logical order as specified by the Beam Physics extension. Currently, the logical order is either (x,y,z) or (r,theta,z) as appropriate. The order of the metadata is independent of the how the grid data storage is ordered. That is, the metadata order is independent of the setting of gridDataOrder
.
Reasoning:
- It simplifies the handling of meta data. No metadata order reversals are needed. Additionally, for the specific instance of the
gridOriginOffset
metadata, this offset is always specified using (x,y,z) Cartesian coordinates independent of what the field grid coordinates are. It would be confusing to write this as (z,y,x) when a (r,theta,z) grid is stored in reverse order. - It simplifies checking if the grid data needs to be transposed. Here only one attribute needs to be looked at. If, on the other hand, axis labels need to be checked, this is more complicated.
Notes:
-
gridDataOrder
would be stored at the component level along withunitDimension
, etc.
I agree with this. This allows the developer to choose to transpose the data or the metadata as they prefer. Note that low-level tools like HDFView and h5dump will report reversed shapes for F ordered data.
We discussed further on this yesterday and might want to avoid a avoid situations like this:
low-level tools like HDFView and h5dump will report reversed shapes for F ordered data
Exactly, when comparing HDF5/ISO-C binding/memory-view attributes and shape, meta-data reversal is needed one way or another when dealing with C/F ordering.
Fixing (meta/grid)dataOrder='C'
ensures a consistent view in low-levels tools, otherwise h5ls/h5dump with (meta/grid)dataOrder = 'F'
for a record that is record[x,y,z]
named in Fortran looks like this:
- data indexing: [iz, iy, ix]
- shape: [Nz, Ny, Nx]
- axisLabel: ["x", "y", "z"]
- offsets: [offsetX, offsetY, offsetY]
- spacing: [spacingX, spacingY, spacingZ]
This only affects these 4 openPMD attributes in the base standard.
The logic with a fixed gridDataOrder
(=C) is minimal and a one-liner when reading to or writing from F-ordered indices:
https://github.com/ECP-WarpX/WarpX/blob/20.11/Source/Diagnostics/WarpXOpenPMD.cpp#L682-L689
Referencing #194 #125 #129 as we decided to simplify the logic together. Please also recap the reports and comments in it again together, especially the comments of people that implement flexible tools like the openPMD-viewer, the -api, etc.
An longer background explanation for the current implementation: mozilla.pdf
Hi, I am going a bit off-topic with this but I think one could in general decouple the data format used to store data from the representation in memory. This is left to the user for now and maybe it should.
There are however many relevant use cases for both particle and mesh data where these two representations differ.
My current solution to this would be to add meta data to describe data layout and create a high level API that looks more object-like to the user (= 'give me the positions of a million particles' rather than 'read three million SOA entries') that uses this meta data.
Gist: What you propose here could be relevant at other places in the standard and a common approach to make openPMD truely independent of data representation in storage.
Resolved today at IPAC with @DavidSagan:
We solve this by removing dataOrder
in 2.0 and introducing a convention for the axisLabels
light weight meta-data. All other attributes (gridSpacing, gridOffset, etc.) will follow the same order as axisLabels
.
https://github.com/openPMD/openPMD-standard/blob/upcoming-2.0.0/STANDARD.md#mesh-based-records
This will simplify readers, so they do not need to check one more attribute when doing their conditional checks if transposes are needed. Readers generally check their data order against the defined one, but now they do not need to additionally check one more attribute.
Example with 1.0 was for a Fortran reader:
def mesh_needs_transpose(mesh):
"""VTK meshes are in order FortranArray[x,y,z].
openPMD supports labeling nD arrays."""
# the openPMD v1.0.*/1.1.* attribute dataOrder describes if metadata is to be inverted
meta_data_in_C = mesh.data_order == "C"
first_axis_label = mesh.axis_labels[0] if meta_data_in_C else mesh.axis_labels[-1]
last_axis_label = mesh.axis_labels[-1] if meta_data_in_C else mesh.axis_labels[0]
# common for 1D and 2D data in openPMD from accelerator physics: axes labeled as x-z and only z
if (first_axis_label == "x" and last_axis_label == "z"):
return False
else:
return True
https://gitlab.kitware.com/paraview/paraview/-/merge_requests/6837 and in 2.0 we can remove the
meta_data_in_C = mesh.data_order == "C"