openPMD-api
openPMD-api copied to clipboard
[WIP] Use JSON/TOML template for defining openPMD metadata in a config file
Not relevant for next release
- Until now: Simulations specify their metadata in-code via API calls
- With this PR: In some workflows (e.g. experiments) there is no omniscient simulation, but metadata is instead input by the experimentors via configuration files, using the API is not a good workflow for that
Idea: We already have a JSON backend, use an openPMD-conforming JSON dataset to define only metadata. With this, the configuration file will be just another openPMD dataset. Then, add some functionality to initialize an empty Series from such a metadata file.
TODO:
- [x] Add a
templatemode to the JSON backend that (1) does not pre-fill datasets, (2) does not allow writing to datasets - [x] Use TOML as an alternative target for the JSON backend (better because it allows comments)
- [x] Add a basic
intitializeFromTemplate()functionality - [ ] Check how this fits experiments workflows, adapt
- [x] Make JSON/TOML maybe a bit easier to write, e.g. make the
datatypefield optional - [ ] Maybe find another way to distinguish groups from datasets, so that not even
dtypeandextentare required - [ ] Maybe find a way to make default attributes optional
- [x] Merge #1218 first
- [x] Introduce a good workflow to use a variable-based single iteration Series such that the single iteration represents all iterations
- [ ] Check interplay with Read Mode changes
- [x] Maybe automatically detect template mode in reading
- [x] Merge https://github.com/openPMD/openPMD-api/pull/1278 first
- [ ] documentation
- [ ] Merge #1493 first
- [ ] Will probably need an update to work together with #1432
https://github.com/franzpoeschel/openPMD-api/compare/topic-json-short-modes..topic-json-template
An openPMD dataset in TOML:
[platform_byte_widths]
USHORT = 2
ULONG = 8
BOOL = 1
CLONG_DOUBLE = 32
LONGLONG = 8
CFLOAT = 8
CHAR = 1
DOUBLE = 8
CDOUBLE = 16
SHORT = 2
UCHAR = 1
FLOAT = 4
INT = 4
ULONGLONG = 8
UINT = 4
LONG = 8
LONG_DOUBLE = 16
[data]
[data.0]
[data.0.meshes]
[data.0.meshes.E]
[data.0.meshes.E.x]
datatype = "FLOAT"
data = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
[data.0.meshes.E.x.attributes]
[data.0.meshes.E.x.attributes.unitSI]
value = 1.0
datatype = "DOUBLE"
[data.0.meshes.E.x.attributes.position]
value = [0.0]
datatype = "VEC_DOUBLE"
[data.0.meshes.E.attributes]
[data.0.meshes.E.attributes.timeOffset]
value = 0.0
datatype = "FLOAT"
[data.0.meshes.E.attributes.gridUnitSI]
value = 1.0
datatype = "DOUBLE"
[data.0.meshes.E.attributes.gridSpacing]
value = [1.0]
datatype = "VEC_DOUBLE"
[data.0.meshes.E.attributes.gridGlobalOffset]
value = [0.0]
datatype = "VEC_DOUBLE"
[data.0.meshes.E.attributes.unitDimension]
value = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
datatype = "ARR_DBL_7"
[data.0.meshes.E.attributes.geometry]
value = "cartesian"
datatype = "STRING"
[data.0.meshes.E.attributes.dataOrder]
value = "C"
datatype = "STRING"
[data.0.meshes.E.attributes.axisLabels]
value = ["x"]
datatype = "VEC_STRING"
[data.0.attributes]
[data.0.attributes.timeUnitSI]
value = 1.0
datatype = "DOUBLE"
[data.0.attributes.time]
value = 0.0
datatype = "DOUBLE"
[data.0.attributes.dt]
value = 1.0
datatype = "DOUBLE"
[attributes]
[attributes.softwareVersion]
value = "0.15.0-dev"
datatype = "STRING"
[attributes.software]
value = "openPMD-api"
datatype = "STRING"
[attributes.openPMDextension]
value = 0
datatype = "UINT"
[attributes.meshesPath]
value = "meshes/"
datatype = "STRING"
[attributes.iterationFormat]
value = "many_iterations_%T"
datatype = "STRING"
[attributes.iterationEncoding]
value = "fileBased"
datatype = "STRING"
[attributes.openPMD]
value = "1.1.0"
datatype = "STRING"
[attributes.date]
value = "2022-05-18 12:20:23 +0000"
datatype = "STRING"
[attributes.basePath]
value = "/data/%T/"
datatype = "STRING"
This is now a simplified TOML openPMD template, created by {"json":{"mode": "template"}}:
[data]
[data.meshes]
[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"
[data.meshes.temperature.attributes]
timeOffset = 0.0
# Explicit datatype can still be used if needed
unitSI = {"value" = 1.0, "datatype" = "FLOAT"}
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]
[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0
[attributes]
softwareVersion = "0.15.0-dev"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 11:55:07 +0000"
basePath = "/data"
Differences to regular JSON/TOML openPMD datasets:
- Platform byte width table is missing
- Attributes don't explicitly store their datatypes, datatypes are dynamically (and a bit heuristically) restored from what is there.
- No actual datasets can be written, instead just the extent is stored.
Template mode is also available in json:
{
"attributes": {
"basePath": "/data",
"date": "2022-05-19 12:00:09 +0000",
"iterationEncoding": "variableBased",
"iterationFormat": "/data",
"meshesPath": "meshes/",
"openPMD": "1.1.0",
"openPMDextension": 0,
"software": "openPMD-api",
"softwareVersion": "0.15.0-dev"
},
"data": {
"attributes": {
"dt": 1,
"snapshot": 0,
"time": 0,
"timeUnitSI": 1
},
"meshes": {
"temperature": {
"attributes": {
"axisLabels": [
"x"
],
"dataOrder": "C",
"geometry": "cartesian",
"gridGlobalOffset": [
0
],
"gridSpacing": [
1
],
"gridUnitSI": 1,
"position": [
0
],
"timeOffset": 0,
"unitDimension": [
0,
0,
0,
0,
0,
0,
0
],
"unitSI": 1
},
"datatype": "FLOAT",
"extent": [
5,
5
]
}
}
}
}
Longer example:
[data]
[data.particles]
[data.particles.e]
[data.particles.e.positionOffset]
[data.particles.e.positionOffset.z]
[data.particles.e.positionOffset.z.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]
[data.particles.e.positionOffset.y]
[data.particles.e.positionOffset.y.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]
[data.particles.e.positionOffset.x]
[data.particles.e.positionOffset.x.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]
[data.particles.e.positionOffset.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0
[data.particles.e.position]
[data.particles.e.position.z]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.position.z.attributes]
unitSI = 1.0
[data.particles.e.position.y]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.position.y.attributes]
unitSI = 1.0
[data.particles.e.position.x]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.position.x.attributes]
unitSI = 1.0
[data.particles.e.position.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0
[data.particles.e.particlePatches]
[data.particles.e.particlePatches.numParticlesOffset]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.numParticlesOffset.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.numParticles]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.numParticles.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.offset]
[data.particles.e.particlePatches.offset.z]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.offset.z.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.offset.y]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.offset.y.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.offset.x]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.offset.x.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.offset.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[data.particles.e.particlePatches.extent]
[data.particles.e.particlePatches.extent.z]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.extent.z.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.extent.y]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.extent.y.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.extent.x]
extent = [5, 5]
datatype = "FLOAT"
[data.particles.e.particlePatches.extent.x.attributes]
unitSI = 1.0
[data.particles.e.particlePatches.extent.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[data.meshes]
[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"
[data.meshes.temperature.attributes]
timeOffset = 0.0
unitSI = 1.0
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]
[data.meshes.E]
[data.meshes.E.z]
extent = [5, 5]
datatype = "FLOAT"
[data.meshes.E.z.attributes]
unitSI = 1.0
position = [0.0]
[data.meshes.E.y]
extent = [5, 5]
datatype = "FLOAT"
[data.meshes.E.y.attributes]
unitSI = 1.0
position = [0.0]
[data.meshes.E.x]
extent = [5, 5]
datatype = "FLOAT"
[data.meshes.E.x.attributes]
unitSI = 1.0
position = [0.0]
[data.meshes.E.attributes]
timeOffset = 0.0
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]
[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0
[attributes]
softwareVersion = "0.15.0-dev"
particlesPath = "particles/"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 15:26:37 +0000"
basePath = "/data"
Notes for myself on the recent reodering of commits:
5 3ee509de (HEAD -> topic-json-template, origin/topic-json-template) Properly deal with undefined datasets
2 06da2d58 Make JSON and TOML look like two different backends
5 960ab21a Initialize Dataset definitions from template
5 b88bae67 Initialize Series attributes from template
3 6302a33c Fix NVHPC Toml11 open mode
2 d825008b Fix precision-losing type conversion
4 da960a23 Enable .toml tests in generic tests
4 0398b86f Extend example
3 7332996e Windows compatibility
x 85527799 Add and use Attribute::getOptional<T>()
1 64cde966 Template mode: Fill with zero upon read
1 fa483843 Write/read shorthand attributes without explicit datatype
3 bd8da013 CI fixes
1 d802d2ac Don't write platform datatype size table in template mode
2 cba71f7f Use .toml as filename extension
2 b019a7d1 TOML as alternative backend for JSON backend
1 4b25de8c Select template mode via JSON param
1 8ef4753f Add template mode to JSON backend
How to go forward with this PR
This PR implements a set of features, that work well together, but also make sense on their own. I will split this PR into several parts in order to merge small chunks of it to dev:
- Simplified attribute layout in JSON backend without datatype annotation
- Optionally simplified dataset layout in JSON backend without datatype annotation (not yet implemented)
- Template mode (dataset extent + datatype instead of actual data)
- TOML backend
- Helpers for config file workflows
- Maybe: Copy datatype names from
Datatype.hppinto JSON backend in order to avoid unwanted changes
Feature-wise, this requires the decoupling of template mode and simplified attribute layout which is not yet implemented, but makes sense.