openPMD-api icon indicating copy to clipboard operation
openPMD-api copied to clipboard

Writing a field of custom type

Open sbastrakov opened this issue 4 years ago • 3 comments

HI,

I am not sure what is a good pattern in the following case.

I have a large C-style array of some bitwise-copyable struct. A particular struct being used depends on build parameters and comes from a 3rd party library. So generally I don't know how this type is defined, just know the size and that it is bitwise-copyable. I would like to write this field using storeChunk. Do not care about interpretation, just that after reading back it is exactly same. Should I reinterpret it as N * sizeof(MyStruct) array of char and write like that, or is there something better?

This comes from checkpointing internal RNG states (which may come from a number of implementations) in PIConGPU.

sbastrakov avatar Sep 02 '21 11:09 sbastrakov

@ax3l do you know if work on encoding POD structs in ADIOS2 is still going? Otherwise, the two unsatisfying approaches that I see for now:

  1. Do exactly what you described. Drawback: This turns the normally self-contained openPMD datasets partially into non-portable byte-by-byte serializations. Not a huge problem in your situation, not really desirable either.
  2. Completely reorganize the data. By writing a field of struct type, you are essentially writing an array-of-struct. Turn that into a struct of array. Has a development and performance overhead.

franzpoeschel avatar Sep 03 '21 11:09 franzpoeschel

I believe approach 2 requires knowing all struct members and being able to access them individually. Which is not always possible if this struct comes from a 3rd-party code, like the RNG state case which triggered my question.

sbastrakov avatar Sep 03 '21 11:09 sbastrakov

Thanks for the question. Yes, struct support is definitely not available in ADIOS and would need its own APIs, similar to MPI, on how to define user-specific types. You would also need to know what's in your struct then.

I would say do approach 1. - you could use fixed-size char arrays (Datatype::VEC_CHAR) in openPMD-api for that. chars are guaranteed to be of one Byte in size and are usually the natural type to go over for such blobs. On second thought, Datatype::VEC_* might not be implemented for all backends - in that case use CHAR and increase array size by N as you suggested is the way to go.

I thin we likely also used char or int8 types of some sort in the old checkpoint-restart implementation for the RNG state in PIConGPU for this... Would have to look it up :)

ax3l avatar Sep 22 '21 01:09 ax3l