openPMD-standard
openPMD-standard copied to clipboard
Allow user to store non-openPMD information
Today I had a request from a prospective user of openPMD: can the user store other datasets in the HDF5-openPMD files, which are not meant to be read by the openPMD parser / viewer ? (for instance because they don't fall either in the category mesh or or in the category particle)
From my point of view, the best way to do this is to store this data outside of the basePath
(in this case the parser will not try to read it).
@ax3l : Does that make sense ? If yes, I think it would be good to add it as a side-remark in the standard.
Absolutely, they can store it everywhere as long as it does not collide with reserved attributes. The standard is not exclusive and only defines a minimal required markup :)
That is also the way how new extensions (might) develop: from best practices of existing additional attributes & records.
Regarding other data sets (records): if they are stored in the base path + particle/mesh path they need to fulfill the requirements of those locations. Else just add them to an other non-reserved location inside the base path and it will be fine since we don't search outside for standardized formatting.
Does that make sense ? If yes, I think it would be good to add it as a side-remark in the standard.
Make absolutely sense and good point and we should make this more clear in the standard.
I marked it as a revision change
(patch level) which means it can be added in e.g., a 1.0.1
release (as you already marked!).
Ok, great! I'll do a corresponding PR in the next few days.
An other and additional/orthogonal approach to allow non-openPMD information inside basePath
too: if one wants to avoid parsing of additional records
, in detail
- directories
- data sets
(since additional attributes do not harm the parsers), we could also provide a prefix that is ignored by the parsers. Lets say their names must start with "+". But maybe that is messy.
Would allow to experiment with new additional information, e.g., irregular mesh geometries, inside basePath
.
Currently, only attributes
can be freely added at any place (well, it's recommended to name them comment
). Groups and data sets are restricted inside those paths.
implemented in 1.0.1
@ax3l
Currently, only attributes can be freely added at any place (well, it's recommended to name them comment). Groups and data sets are restricted inside those paths.
I would propose that that it should be allowed for groups and data sets to be freely added. I can imagine situations where, for example, a person wants to add per-particle data and the restriction that this has to be put outside of the group that holds the particle data makes things very messy. Certainly we could mandate that such added groups or data sets be marked as extra. For example using a "+" prefix as you suggested. I think this is a fairly clean solution.
Alternative (and maybe more radical) suggestion:
- Allow custom group hierarchies with custom datasets and custom attributes inside every iteration
- Treat
meshes
andparticles
as keywords, reserved to openPMD (to be more precise: whatever is defined inmeshesPath
andparticlesPath
) - Inside these paths, the typical openPMD hierarchy applies, and all data should follow strictly the openPMD standard
The fundamental idea would be that an openPMD dataset cannot only (1) be augmented by custom hierarchies (i.e. have the classical openPMD hierarchy, and other stuff around it that the API ignores), but instead that (2) an openPMD is a custom hierarchy with the classical openPMD structure embedded into it at any place. Instead of ignoring custom hierarchies, openPMD could then benefit from and interact with them.
Example: Mesh refinement currently works via the naming of the meshes. Alternatively, one could do:
/data/0/refined_mesh_levels/0/meshes/E
/data/0/refined_mesh_levels/0/meshes/B
/data/0/refined_mesh_levels/1/meshes/E
/data/0/refined_mesh_levels/1/meshes/B
/data/0/refined_mesh_levels/2/meshes/E
/data/0/refined_mesh_levels/2/meshes/B
+++++++ ––––––––––––––––––––– ++++++++
standard custom standard
/data/0/simulation_internal/some_checkpointing_info
+++++++ –––––––––––––––––––––––––––––––––––––––––––
standard custom
Codes such as for example PIConGPU can put their internal datasets (e.g. PIConGPU_id_provider
) anywhere in that hierarchy, and it would be ignored instead of cluttering the openPMD dataset.
Ideally, if done correctly, this would mean that a single dataset can use several standards at the same time, such as mixing Nexus with openPMD.
Downside: No huge change for the standard, but a rather large change for implementations. Readers would need to be updated to find openPMD structures throughout the datasets.
That sounds useful and would be equivalent to relaxing meshes path from values like meshesPath="meshes/"
to regexes such as meshesPath=".*meshes/
(or the hard-coding of this exact regex in the standard).
I am not sure if we will not need an "exclude this from parsing" nonetheless via an attribute on custom groups/variables - without it we would keep things definitely fully separate besides sharing an iteration/snapshot id (if that works then that is fine).
For the HELPMI project, I drew up some visualizations of the proposed addition.
openPMD currently:
Proposed extension:
That sounds useful and would be equivalent to relaxing meshes path from values like
meshesPath="meshes/"
to regexes such asmeshesPath=".*meshes/
(or the hard-coding of this exact regex in the standard).
Using a regex is one of the options, yes. Another (more restricted) option would be a list of paths. Even though it's redundant, I would even suggest a list of patterns, as that is a common workflow in file managing software?
I am not sure if we will not need an "exclude this from parsing" nonetheless via an attribute on custom groups/variables - without it we would keep things definitely fully separate besides sharing an iteration/snapshot id (if that works then that is fine).
Using exclude patterns is a common enough pattern in a lot of software (rsync, git ignore, backup software, …), so, I'm fine with using that. I don't understand what you mean by "without it we would keep things definitely fully separate besides sharing an iteration/snapshot id"?
Sounds great. Designing as lists of patterns/paths is a good idea.
The last comment was simply: yes, I think we need an exclude pattern, too (as in rsync, git ignore, backup software, ...).
Real-life WIP example from PIConGPU: Checkpointing information is stored under picongpu_internal/
, the RNGProvider
is a field inside that group (normal openPMD markup), idProvider
contains two non-openPMD datasets.
float /data/1000/fields/B/x {64, 64, 64}
float /data/1000/fields/B/y {64, 64, 64}
float /data/1000/fields/B/z {64, 64, 64}
float /data/1000/fields/Convolutional PML B/xy {1, 1, 198144}
float /data/1000/fields/Convolutional PML B/xz {1, 1, 198144}
float /data/1000/fields/Convolutional PML B/yx {1, 1, 198144}
float /data/1000/fields/Convolutional PML B/yz {1, 1, 198144}
float /data/1000/fields/Convolutional PML B/zx {1, 1, 198144}
float /data/1000/fields/Convolutional PML B/zy {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/xy {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/xz {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/yx {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/yz {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/zx {1, 1, 198144}
float /data/1000/fields/Convolutional PML E/zy {1, 1, 198144}
float /data/1000/fields/E/x {64, 64, 64}
float /data/1000/fields/E/y {64, 64, 64}
float /data/1000/fields/E/z {64, 64, 64}
float /data/1000/particles/e/momentum/x {55401}
float /data/1000/particles/e/momentum/y {55401}
float /data/1000/particles/e/momentum/z {55401}
uint64_t /data/1000/particles/e/particlePatches/extent/x {1}
uint64_t /data/1000/particles/e/particlePatches/extent/y {1}
uint64_t /data/1000/particles/e/particlePatches/extent/z {1}
uint64_t /data/1000/particles/e/particlePatches/numParticles {1}
uint64_t /data/1000/particles/e/particlePatches/numParticlesOffset {1}
uint64_t /data/1000/particles/e/particlePatches/offset/x {1}
uint64_t /data/1000/particles/e/particlePatches/offset/y {1}
uint64_t /data/1000/particles/e/particlePatches/offset/z {1}
float /data/1000/particles/e/position/x {55401}
float /data/1000/particles/e/position/y {55401}
float /data/1000/particles/e/position/z {55401}
int32_t /data/1000/particles/e/positionOffset/x {55401}
int32_t /data/1000/particles/e/positionOffset/y {55401}
int32_t /data/1000/particles/e/positionOffset/z {55401}
float /data/1000/particles/e/weighting {55401}
char /data/1000/picongpu_internal/fields/RNGProvider3XorMin {64, 64, 1536}
uint64_t /data/1000/picongpu_internal/idProvider/nextId {1, 1, 1}
uint64_t /data/1000/picongpu_internal/idProvider/startId {1, 1, 1}