openPMD-api Dataset not properly initialized ?

Describe the bug Dataset class not properly initialised

To Reproduce trying c++ code from the repository, 1D domain decompostion/particle

 Dataset dataset = Dataset(determineDatatype<double>(), {mpi_size});
    
     if (0 == mpi_rank)
      cout << "Prepared a Dataset of size " << dataset.extent[0] << " x "
	   << dataset.extent[1] << " and Datatype " << dataset.dtype
	   << '\n';

** obtained output ** Prepared a Dataset of size 50 x 47760564901816 and Datatype DOUBLE seems that dataset.extend[1] is not initialised?

Oct 18 '22 14:10 denisbertini

The extent is initialized from the second parameter of the Dataset constructor, where you pass a vector with 1 entry ({mpi_size}). So, accessing the second entry of that vector with dataset.extent[1] is an out-of-bound access.

Depending on what you are trying to do, you would have to either add a second entry to the dataset extent, or to access only the first.

Did you copy this from examples/5_write_parallel.cpp? In that example, things look correct:

        Datatype datatype = determineDatatype<float>();
        Extent global_extent = {10ul * mpi_size, 300};
        Dataset dataset = Dataset(datatype, global_extent);

        if (0 == mpi_rank)
            cout << "Prepared a Dataset of size " << dataset.extent[0] << "x"
                 << dataset.extent[1] << " and Datatype " << dataset.dtype
                 << '\n';

Note that extent is a vector with two entries here.

Oct 18 '22 15:10 franzpoeschel

may be you can help, i would like to extend the following example for parallel writing for particle

 {
    // open file for writing
    Series o = Series(
		      "/lustre/rz/dbertini/otest/openpmd-api.h5", Access::CREATE, MPI_COMM_WORLD);
    
    ParticleSpecies &e = o.iterations[1].particles["e"];
    
    std::vector<double> position_global(mpi_size);
    
    double pos{0.};
    std::generate(position_global.begin(), position_global.end(), [&pos] {
      return pos++;
    });
    
    std::shared_ptr<double> position_local(new double);
    *position_local = position_global[mpi_rank];

    std::vector<float> position_data(1, position_global[mpi_rank]);

    Dataset dataset = Dataset(determineDatatype<double>(), {mpi_size});
    
     if (0 == mpi_rank)
      cout << "Prepared a Dataset of size " << dataset.extent[0] << " x "
	   << dataset.extent[1] << " and Datatype " << dataset.dtype
	   << '\n';
     
    e["position"]["x"].resetDataset(dataset);
    e["position"]["x"].storeChunk(position_local, {mpi_rank}, {1});
    
    o.flush();
  }

which works fine , to the case where position_local is not one value but a std::vector with specific size ? Is there an example i can fetch from the repo?

Oct 18 '22 15:10 denisbertini

Particles are generally a list (more precisely: multiple lists e.g. for x, y, z positions, all with the same size), e.g. a one-dimensional array. The openPMD-api does not have a distinct particle API for the Dataset and RecordComponent classes, so you still use the more generic ds.resetDataset(Datatype::FLOAT, Extent{ 100 }) call, but specify an Extent vector with one single entry, to get a one-dimensional array. For parallel writing, you specify the global extent, across all MPI ranks in that call. Later, when storing actual data, you can then specify the position of your locally-written block rc.storeChunk(data, Offset{50}, Extent{10}), meaning that this rank writes 10 items beginning from the Offset 50. The calculation of offset, local and global extent is the application's task.

@ax3l I think that making this use case a bit more comfortable is the purpose of ADIOS2 local arrays, so maybe we should add support for them at some point? Maybe first ask Norbert if it's worth it.

Oct 18 '22 15:10 franzpoeschel

Maybe this helps, the electron species as written by a PIConGPU simulation. Last column indicates the extent:

  float     /data/200/particles/e/position/x                          {2621434}
  float     /data/200/particles/e/position/y                          {2621434}
  float     /data/200/particles/e/position/z                          {2621434}
  int32_t   /data/200/particles/e/positionOffset/x                    {2621434}
  int32_t   /data/200/particles/e/positionOffset/y                    {2621434}
  int32_t   /data/200/particles/e/positionOffset/z                    {2621434}
  float     /data/200/particles/e/weighting                           {2621434}

Oct 18 '22 15:10 franzpoeschel

I tried something like

 ParticleSpecies &e = series.iterations[1].particles["e"];

    auto const value = double(mpi_size);
    std::vector<double> local_data(100, value);
    //for (int k=0; k<arrays.l_px; k++) local_data.push_back(arrays.px[k]); 
    
    //ParticleSpecies e = series.iterations[1].particles[spec.c_str()];

    // example 1D domain decomposition in first index
    Datatype datatype = determineDatatype<double>();
    Extent global_extent = {100*mpi_size};
    Dataset dataset = Dataset(datatype, global_extent);
    
    if (0 == mpi_rank)
      cout << "Prepared a Dataset of size " << dataset.extent[0] << " x "
	   << dataset.extent[1] << " and Datatype " << dataset.dtype
	   << '\n';    
   

    Offset chunk_offset = {mpi_rank*100};
    Extent chunk_extent = {100};
    e["position"]["x"].resetDataset(dataset);
    e["position"]["x"].storeChunk(local_data, chunk_offset, chunk_extent);

    if (0 == mpi_rank)
      cout << "Registered a single chunk per MPI rank containing its "
	"contribution, "
	"ready to write content to disk\n";

    series.flush();

but it does not seems to work. What am i doing wrong?

Oct 18 '22 15:10 denisbertini

well it works ( the hdf file is dumped correctly as expected ) but the SLURM scheduler shows a error message showing a problem in he execution of the code:

srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: lxbk0833 [3] pmixp_client_v2.c:210 [_errhandler] mpi/pmix: ERROR: Error handler invoked: status = -25: Interrupted system call (4)

Oct 18 '22 15:10 denisbertini

The data in the created HDF5 file is also correct? If you have it installed, you can maybe check if the same error happens with ADIOS2 (change .h5 extension to .bp), but otherwise I haven't seen this error either. Your code looks correct, so I would assume either something system-specific or from within HDF5. Is the batch file correct? Maybe you do something that triggers Slurm to kill the application.

Oct 19 '22 08:10 franzpoeschel

The problem seems to be cluster-related ... nothing linked to openpmd-api ... sorry for the noise. It seems to work now !

Oct 19 '22 08:10 denisbertini

glad to hear!

Oct 19 '22 08:10 franzpoeschel

thx for your competent support !

Oct 19 '22 08:10 denisbertini

openPMD-api openPMD-api copied to clipboard

Dataset not properly initialized ?

openPMD-api
openPMD-api copied to clipboard