ADIOS2 icon indicating copy to clipboard operation
ADIOS2 copied to clipboard

Extend global single value variables to behave more like variable attributes

Open franzpoeschel opened this issue 3 years ago • 2 comments

Summary: Global single value variables are a useful feature for storing variable metadata. As a result, any feature supported by ADIOS2 attributes (constant metadata) is probably useful for global single values, too. Ideally, both concepts can be semantically unified to support a general notion of (changing vs. constant) metadata.

Proposed feature Global single values sit conceptually between "regular" ADIOS2 variables (global arrays) and ADIOS2 attributes. Unlike attributes, they are variable and can change across steps as a first-class feature. Unlike global array variables, they are stored as part of the metadata.

Quoting the documentation linked above:

These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may not change over steps: e.g. total number of particles, collective norm, number of nodes/cells, etc.

=> Effectively, these are intended for variable (~changing) metadata, while ADIOS2 attributes are intended for constant metadata. ADIOS2 attributes have some features that remain useful also when metadata changes:

  • Aggregation. ADIOS2 attributes are only written by rank 0, duplicate definitions are eliminated.
  • Small vectors. Metadata is generally small, but it is not generally a single value. Examples: Axis labels (a vector of three strings), SI units (a vector of seven numbers). There were past discussions on attributes if this blurs the line between metadata and parallel data too much. For many workflows, the defining characteristic of metadata is not that it is a single value, but that it is small and not parallel.

This feature request proposes to add these features to global ~~single~~ values.

Why is this feature important? From our perspective, openPMD metadata generally changes across steps. We are experimenting now with a new data schema for openPMD in ADIOS2 based on global single value variables for implementing openPMD attributes (seminal PR for these efforts) and it is our experience that this fixes many tricky edge cases when using ADIOS2 steps. However, @guj has noticed performance problems at large scale and we have related those to the above two missing features:

  1. Aggregation: Unlike with attributes, defining a variable on n ranks will lead to n instances of it, example executed with 14 parallel ranks:
    > bpls -D dataset.bp
    …
      double    /data/meshes/mymesh/unitSI            scalar
            step 0: 14 instances available
      uint64_t  /data/snapshot                        scalar
            step 0: 14 instances available
    …
    
  2. Small vectors. It's currently necessary to use array-formed variables for these. Treating them like attributes otherwise will lead to n blocks being written:
    > bpls -D dataset.bp
    …
      double    /data/meshes/mymesh/unitDimension     {7}
            step 0: 
              block  0: [0:6]
              block  1: [0:6]
              block  2: [0:6]
              block  3: [0:6]
              block  4: [0:6]
              block  5: [0:6]
              block  6: [0:6]
              block  7: [0:6]
              block  8: [0:6]
              block  9: [0:6]
              block 10: [0:6]
              block 11: [0:6]
              block 12: [0:6]
              block 13: [0:6]
    …
    

What is the potential impact of this feature in the community? Users can decide between constant and changing metadata with less worry for how expressive and performant this makes their data. This proposed change unifies the semantics of attributes and global ~~single~~ values, and clarifies the distinction between both concepts by "constant vs. changing". Is your feature request related to a problem? Please describe. See above. Describe the solution you'd like and potential required effort Mostly described already. API-wise, this would probably require a more explicit definition of global single values.

// today
adios2::Variable<uint32_t> varNodes = io.DefineVariable<uint32_t>("Nodes");
// extended, new
adios2::Variable<uint32_t> varNodes = io.DefineVariable<uint32_t>("Nodes", {adios2::GlobalValue, 7});

(Compare the existing use of adios2::LocalValueDim.) Optionally, distinguish global values more clearly from global arrays in the API to more clearly separate handling of metadata and actual data in user code (as is done today with the distinction e.g. between AvailableVariables and AvailableAttributes)

Effort depends on how reusable the metadata aggregation of attributes and how extensible global single values are implemented today. Might touch data formats too. Describe alternatives you've considered and potential required effort Our intermediate solution will be to add a mode to openPMD-api that allows us to assume that all attributes written from a rank other than 0 can be dropped. This does not work for all use cases (where datasets might be defined only on certain ranks) and a more general solution can only be implemented with an additional aggregation step on our side, since we don't know what our users are doing otherwise.

Additional context Discussed last week with @pnorbert and @ax3l.

franzpoeschel avatar Jul 19 '21 09:07 franzpoeschel