openPMD-api icon indicating copy to clipboard operation
openPMD-api copied to clipboard

[WIP] Use JSON/TOML template for defining openPMD metadata in a config file

Open franzpoeschel opened this issue 3 years ago • 5 comments
trafficstars

Not relevant for next release

  • Until now: Simulations specify their metadata in-code via API calls
  • With this PR: In some workflows (e.g. experiments) there is no omniscient simulation, but metadata is instead input by the experimentors via configuration files, using the API is not a good workflow for that

Idea: We already have a JSON backend, use an openPMD-conforming JSON dataset to define only metadata. With this, the configuration file will be just another openPMD dataset. Then, add some functionality to initialize an empty Series from such a metadata file.

TODO:

  • [x] Add a template mode to the JSON backend that (1) does not pre-fill datasets, (2) does not allow writing to datasets
  • [x] Use TOML as an alternative target for the JSON backend (better because it allows comments)
  • [x] Add a basic intitializeFromTemplate() functionality
  • [ ] Check how this fits experiments workflows, adapt
  • [x] Make JSON/TOML maybe a bit easier to write, e.g. make the datatype field optional
  • [ ] Maybe find another way to distinguish groups from datasets, so that not even dtype and extent are required
  • [ ] Maybe find a way to make default attributes optional
  • [x] Merge #1218 first
  • [x] Introduce a good workflow to use a variable-based single iteration Series such that the single iteration represents all iterations
  • [ ] Check interplay with Read Mode changes
  • [x] Maybe automatically detect template mode in reading
  • [x] Merge https://github.com/openPMD/openPMD-api/pull/1278 first
  • [ ] documentation
  • [ ] Merge #1493 first
  • [ ] Will probably need an update to work together with #1432

https://github.com/franzpoeschel/openPMD-api/compare/topic-json-short-modes..topic-json-template

franzpoeschel avatar May 17 '22 15:05 franzpoeschel

An openPMD dataset in TOML:

[platform_byte_widths]
USHORT = 2
ULONG = 8
BOOL = 1
CLONG_DOUBLE = 32
LONGLONG = 8
CFLOAT = 8
CHAR = 1
DOUBLE = 8
CDOUBLE = 16
SHORT = 2
UCHAR = 1
FLOAT = 4
INT = 4
ULONGLONG = 8
UINT = 4
LONG = 8
LONG_DOUBLE = 16

[data]

[data.0]

[data.0.meshes]

[data.0.meshes.E]

[data.0.meshes.E.x]
datatype = "FLOAT"
data = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

[data.0.meshes.E.x.attributes]

[data.0.meshes.E.x.attributes.unitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.x.attributes.position]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes]

[data.0.meshes.E.attributes.timeOffset]
value = 0.0
datatype = "FLOAT"

[data.0.meshes.E.attributes.gridUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.attributes.gridSpacing]
value = [1.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.gridGlobalOffset]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.unitDimension]
value = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
datatype = "ARR_DBL_7"

[data.0.meshes.E.attributes.geometry]
value = "cartesian"
datatype = "STRING"

[data.0.meshes.E.attributes.dataOrder]
value = "C"
datatype = "STRING"

[data.0.meshes.E.attributes.axisLabels]
value = ["x"]
datatype = "VEC_STRING"

[data.0.attributes]

[data.0.attributes.timeUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.attributes.time]
value = 0.0
datatype = "DOUBLE"

[data.0.attributes.dt]
value = 1.0
datatype = "DOUBLE"

[attributes]

[attributes.softwareVersion]
value = "0.15.0-dev"
datatype = "STRING"

[attributes.software]
value = "openPMD-api"
datatype = "STRING"

[attributes.openPMDextension]
value = 0
datatype = "UINT"

[attributes.meshesPath]
value = "meshes/"
datatype = "STRING"

[attributes.iterationFormat]
value = "many_iterations_%T"
datatype = "STRING"

[attributes.iterationEncoding]
value = "fileBased"
datatype = "STRING"

[attributes.openPMD]
value = "1.1.0"
datatype = "STRING"

[attributes.date]
value = "2022-05-18 12:20:23 +0000"
datatype = "STRING"

[attributes.basePath]
value = "/data/%T/"
datatype = "STRING"

franzpoeschel avatar May 18 '22 12:05 franzpoeschel

This is now a simplified TOML openPMD template, created by {"json":{"mode": "template"}}:

[data]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
# Explicit datatype can still be used if needed
unitSI = {"value" = 1.0, "datatype" = "FLOAT"}
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 11:55:07 +0000"
basePath = "/data"

Differences to regular JSON/TOML openPMD datasets:

  1. Platform byte width table is missing
  2. Attributes don't explicitly store their datatypes, datatypes are dynamically (and a bit heuristically) restored from what is there.
  3. No actual datasets can be written, instead just the extent is stored.

Template mode is also available in json:

{
  "attributes": {
    "basePath": "/data",
    "date": "2022-05-19 12:00:09 +0000",
    "iterationEncoding": "variableBased",
    "iterationFormat": "/data",
    "meshesPath": "meshes/",
    "openPMD": "1.1.0",
    "openPMDextension": 0,
    "software": "openPMD-api",
    "softwareVersion": "0.15.0-dev"
  },
  "data": {
    "attributes": {
      "dt": 1,
      "snapshot": 0,
      "time": 0,
      "timeUnitSI": 1
    },
    "meshes": {
      "temperature": {
        "attributes": {
          "axisLabels": [
            "x"
          ],
          "dataOrder": "C",
          "geometry": "cartesian",
          "gridGlobalOffset": [
            0
          ],
          "gridSpacing": [
            1
          ],
          "gridUnitSI": 1,
          "position": [
            0
          ],
          "timeOffset": 0,
          "unitDimension": [
            0,
            0,
            0,
            0,
            0,
            0,
            0
          ],
          "unitSI": 1
        },
        "datatype": "FLOAT",
        "extent": [
          5,
          5
        ]
      }
    }
  }
}

franzpoeschel avatar May 19 '22 12:05 franzpoeschel

Longer example:

[data]

[data.particles]

[data.particles.e]

[data.particles.e.positionOffset]

[data.particles.e.positionOffset.z]

[data.particles.e.positionOffset.z.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.y]

[data.particles.e.positionOffset.y.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.x]

[data.particles.e.positionOffset.x.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.position]

[data.particles.e.position.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.z.attributes]
unitSI = 1.0

[data.particles.e.position.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.y.attributes]
unitSI = 1.0

[data.particles.e.position.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.x.attributes]
unitSI = 1.0

[data.particles.e.position.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.particlePatches]

[data.particles.e.particlePatches.numParticlesOffset]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticlesOffset.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.numParticles]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticles.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset]

[data.particles.e.particlePatches.offset.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.particles.e.particlePatches.extent]

[data.particles.e.particlePatches.extent.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
unitSI = 1.0
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.meshes.E]

[data.meshes.E.z]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.z.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.y]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.y.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.x]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.x.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.attributes]
timeOffset = 0.0
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
particlesPath = "particles/"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 15:26:37 +0000"
basePath = "/data"

franzpoeschel avatar May 19 '22 15:05 franzpoeschel

Notes for myself on the recent reodering of commits:

5 3ee509de (HEAD -> topic-json-template, origin/topic-json-template) Properly deal with undefined datasets
2 06da2d58 Make JSON and TOML look like two different backends
5 960ab21a Initialize Dataset definitions from template
5 b88bae67 Initialize Series attributes from template
3 6302a33c Fix NVHPC Toml11 open mode
2 d825008b Fix precision-losing type conversion
4 da960a23 Enable .toml tests in generic tests
4 0398b86f Extend example
3 7332996e Windows compatibility
x 85527799 Add and use Attribute::getOptional<T>()
1 64cde966 Template mode: Fill with zero upon read
1 fa483843 Write/read shorthand attributes without explicit datatype
3 bd8da013 CI fixes
1 d802d2ac Don't write platform datatype size table in template mode
2 cba71f7f Use .toml as filename extension
2 b019a7d1 TOML as alternative backend for JSON backend
1 4b25de8c Select template mode via JSON param
1 8ef4753f Add template mode to JSON backend

franzpoeschel avatar Jul 21 '22 16:07 franzpoeschel

How to go forward with this PR

This PR implements a set of features, that work well together, but also make sense on their own. I will split this PR into several parts in order to merge small chunks of it to dev:

  1. Simplified attribute layout in JSON backend without datatype annotation
  2. Optionally simplified dataset layout in JSON backend without datatype annotation (not yet implemented)
  3. Template mode (dataset extent + datatype instead of actual data)
  4. TOML backend
  5. Helpers for config file workflows
  6. Maybe: Copy datatype names from Datatype.hpp into JSON backend in order to avoid unwanted changes

Feature-wise, this requires the decoupling of template mode and simplified attribute layout which is not yet implemented, but makes sense.

franzpoeschel avatar Apr 14 '23 14:04 franzpoeschel