openPMD-standard icon indicating copy to clipboard operation
openPMD-standard copied to clipboard

Allow user to store non-openPMD information

Open RemiLehe opened this issue 9 years ago • 11 comments

Today I had a request from a prospective user of openPMD: can the user store other datasets in the HDF5-openPMD files, which are not meant to be read by the openPMD parser / viewer ? (for instance because they don't fall either in the category mesh or or in the category particle)

From my point of view, the best way to do this is to store this data outside of the basePath (in this case the parser will not try to read it).

@ax3l : Does that make sense ? If yes, I think it would be good to add it as a side-remark in the standard.

RemiLehe avatar Dec 04 '15 18:12 RemiLehe

Absolutely, they can store it everywhere as long as it does not collide with reserved attributes. The standard is not exclusive and only defines a minimal required markup :)

That is also the way how new extensions (might) develop: from best practices of existing additional attributes & records.

Regarding other data sets (records): if they are stored in the base path + particle/mesh path they need to fulfill the requirements of those locations. Else just add them to an other non-reserved location inside the base path and it will be fine since we don't search outside for standardized formatting.

ax3l avatar Dec 04 '15 21:12 ax3l

Does that make sense ? If yes, I think it would be good to add it as a side-remark in the standard.

Make absolutely sense and good point and we should make this more clear in the standard. I marked it as a revision change (patch level) which means it can be added in e.g., a 1.0.1 release (as you already marked!).

ax3l avatar Dec 05 '15 21:12 ax3l

Ok, great! I'll do a corresponding PR in the next few days.

RemiLehe avatar Dec 07 '15 18:12 RemiLehe

An other and additional/orthogonal approach to allow non-openPMD information inside basePath too: if one wants to avoid parsing of additional records, in detail

  • directories
  • data sets

(since additional attributes do not harm the parsers), we could also provide a prefix that is ignored by the parsers. Lets say their names must start with "+". But maybe that is messy.

Would allow to experiment with new additional information, e.g., irregular mesh geometries, inside basePath.

Currently, only attributes can be freely added at any place (well, it's recommended to name them comment). Groups and data sets are restricted inside those paths.

ax3l avatar Dec 10 '15 12:12 ax3l

implemented in 1.0.1

ax3l avatar Nov 24 '17 14:11 ax3l

@ax3l

Currently, only attributes can be freely added at any place (well, it's recommended to name them comment). Groups and data sets are restricted inside those paths.

I would propose that that it should be allowed for groups and data sets to be freely added. I can imagine situations where, for example, a person wants to add per-particle data and the restriction that this has to be put outside of the group that holds the particle data makes things very messy. Certainly we could mandate that such added groups or data sets be marked as extra. For example using a "+" prefix as you suggested. I think this is a fairly clean solution.

DavidSagan avatar Mar 03 '19 22:03 DavidSagan

Alternative (and maybe more radical) suggestion:

  • Allow custom group hierarchies with custom datasets and custom attributes inside every iteration
  • Treat meshes and particles as keywords, reserved to openPMD (to be more precise: whatever is defined in meshesPath and particlesPath)
  • Inside these paths, the typical openPMD hierarchy applies, and all data should follow strictly the openPMD standard

The fundamental idea would be that an openPMD dataset cannot only (1) be augmented by custom hierarchies (i.e. have the classical openPMD hierarchy, and other stuff around it that the API ignores), but instead that (2) an openPMD is a custom hierarchy with the classical openPMD structure embedded into it at any place. Instead of ignoring custom hierarchies, openPMD could then benefit from and interact with them.

Example: Mesh refinement currently works via the naming of the meshes. Alternatively, one could do:

/data/0/refined_mesh_levels/0/meshes/E
/data/0/refined_mesh_levels/0/meshes/B
/data/0/refined_mesh_levels/1/meshes/E
/data/0/refined_mesh_levels/1/meshes/B
/data/0/refined_mesh_levels/2/meshes/E
/data/0/refined_mesh_levels/2/meshes/B
+++++++ ––––––––––––––––––––– ++++++++
standard        custom        standard

/data/0/simulation_internal/some_checkpointing_info
+++++++ –––––––––––––––––––––––––––––––––––––––––––
standard                  custom

Codes such as for example PIConGPU can put their internal datasets (e.g. PIConGPU_id_provider) anywhere in that hierarchy, and it would be ignored instead of cluttering the openPMD dataset.

Ideally, if done correctly, this would mean that a single dataset can use several standards at the same time, such as mixing Nexus with openPMD.

Downside: No huge change for the standard, but a rather large change for implementations. Readers would need to be updated to find openPMD structures throughout the datasets.

franzpoeschel avatar Sep 21 '22 17:09 franzpoeschel

That sounds useful and would be equivalent to relaxing meshes path from values like meshesPath="meshes/" to regexes such as meshesPath=".*meshes/ (or the hard-coding of this exact regex in the standard).

I am not sure if we will not need an "exclude this from parsing" nonetheless via an attribute on custom groups/variables - without it we would keep things definitely fully separate besides sharing an iteration/snapshot id (if that works then that is fine).

ax3l avatar Oct 12 '22 18:10 ax3l

For the HELPMI project, I drew up some visualizations of the proposed addition.

openPMD currently: opmd_hierarchy

Proposed extension: opmd_hierarchy_extended

That sounds useful and would be equivalent to relaxing meshes path from values like meshesPath="meshes/" to regexes such as meshesPath=".*meshes/ (or the hard-coding of this exact regex in the standard).

Using a regex is one of the options, yes. Another (more restricted) option would be a list of paths. Even though it's redundant, I would even suggest a list of patterns, as that is a common workflow in file managing software?

I am not sure if we will not need an "exclude this from parsing" nonetheless via an attribute on custom groups/variables - without it we would keep things definitely fully separate besides sharing an iteration/snapshot id (if that works then that is fine).

Using exclude patterns is a common enough pattern in a lot of software (rsync, git ignore, backup software, …), so, I'm fine with using that. I don't understand what you mean by "without it we would keep things definitely fully separate besides sharing an iteration/snapshot id"?

franzpoeschel avatar Apr 24 '23 17:04 franzpoeschel

Sounds great. Designing as lists of patterns/paths is a good idea.

The last comment was simply: yes, I think we need an exclude pattern, too (as in rsync, git ignore, backup software, ...).

ax3l avatar Apr 25 '23 18:04 ax3l

Real-life WIP example from PIConGPU: Checkpointing information is stored under picongpu_internal/, the RNGProvider is a field inside that group (normal openPMD markup), idProvider contains two non-openPMD datasets.

  float     /data/1000/fields/B/x                                      {64, 64, 64}                                                                                                                                                          
  float     /data/1000/fields/B/y                                      {64, 64, 64}                                                                                                                                                          
  float     /data/1000/fields/B/z                                      {64, 64, 64}                                                                                                                                                          
  float     /data/1000/fields/Convolutional PML B/xy                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML B/xz                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML B/yx                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML B/yz                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML B/zx                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML B/zy                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML E/xy                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML E/xz                   {1, 1, 198144}                                                                                                                                                        
  float     /data/1000/fields/Convolutional PML E/yx                   {1, 1, 198144}                                 
  float     /data/1000/fields/Convolutional PML E/yz                   {1, 1, 198144}                                 
  float     /data/1000/fields/Convolutional PML E/zx                   {1, 1, 198144}                                 
  float     /data/1000/fields/Convolutional PML E/zy                   {1, 1, 198144}                                 
  float     /data/1000/fields/E/x                                      {64, 64, 64}                                   
  float     /data/1000/fields/E/y                                      {64, 64, 64}                                   
  float     /data/1000/fields/E/z                                      {64, 64, 64}                                   
  float     /data/1000/particles/e/momentum/x                          {55401}                                        
  float     /data/1000/particles/e/momentum/y                          {55401}                                        
  float     /data/1000/particles/e/momentum/z                          {55401}                                        
  uint64_t  /data/1000/particles/e/particlePatches/extent/x            {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/extent/y            {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/extent/z            {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/numParticles        {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/numParticlesOffset  {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/offset/x            {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/offset/y            {1}                                            
  uint64_t  /data/1000/particles/e/particlePatches/offset/z            {1}                                            
  float     /data/1000/particles/e/position/x                          {55401}                                        
  float     /data/1000/particles/e/position/y                          {55401}                                        
  float     /data/1000/particles/e/position/z                          {55401}                                        
  int32_t   /data/1000/particles/e/positionOffset/x                    {55401}                                        
  int32_t   /data/1000/particles/e/positionOffset/y                    {55401}                                        
  int32_t   /data/1000/particles/e/positionOffset/z                    {55401}                                        
  float     /data/1000/particles/e/weighting                           {55401}                                        
  char      /data/1000/picongpu_internal/fields/RNGProvider3XorMin     {64, 64, 1536}                                 
  uint64_t  /data/1000/picongpu_internal/idProvider/nextId             {1, 1, 1}                                      
  uint64_t  /data/1000/picongpu_internal/idProvider/startId            {1, 1, 1}

franzpoeschel avatar Jun 05 '23 17:06 franzpoeschel