WarpX icon indicating copy to clipboard operation
WarpX copied to clipboard

BackTransformDiagnostics: position offset does not have consistent number of macroparticle

Open RemiLehe opened this issue 2 years ago • 1 comments

When running this input file: inputs.txt it seems that the metadata of the BackTransformed diagnostics is inconsistent. For instance, running the following Python code:

from openpmd_viewer import OpenPMDTimeSeries
ts = OpenPMDTimeSeries('./diags/diag_btd/')
ts.get_particle(['x'], iteration=3)

results in the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4c6f54023500> in <module>
----> 1 ts.get_particle(['x'], iteration=3)

~/miniconda3/lib/python3.9/site-packages/openpmd_viewer/openpmd_timeseries/main.py in get_particle(self, var_list, species, t, iteration, select, plot, nbins, plot_range, use_field_mesh, histogram_deposition, **kw)
    272         data_list = []
    273         for quantity in var_list:
--> 274             data_list.append( self.data_reader.read_species_data(
    275                 iteration, species, quantity, self.extensions))
    276         # Apply selection if needed

~/miniconda3/lib/python3.9/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/data_reader.py in read_species_data(self, iteration, species, record_comp, extensions)
    281                     filename, iteration, species, record_comp, extensions )
    282         elif self.backend == 'openpmd-api':
--> 283             return io_reader.read_species_data(
    284                     self.series, iteration, species, record_comp, extensions )
    285 

~/miniconda3/lib/python3.9/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/particle_reader.py in read_species_data(series, iteration, species_name, component_name, extensions)
     87     if component_name in ['x', 'y', 'z']:
     88         offset = get_data(series, species['positionOffset'][component_name])
---> 89         data += offset
     90     # - Return momentum in normalized units
     91     elif component_name in ['ux', 'uy', 'uz' ]:

ValueError: operands could not be broadcast together with shapes (41886,) (41894,) (41886,) 

RemiLehe avatar Sep 12 '22 23:09 RemiLehe

I still see this issue running WarpX on Summit, even with the new fix:

Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 17:18:21) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from openpmd_viewer.addons import LpaDiagnostics
>>> ts = LpaDiagnostics('diags/diag_lab')
>>> ii = 20
>>> x, z, uz = ts.get_particle(species='beam', iteration=ts.iterations[ii], var_list = ['x', 'z','uz'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ccs/home/rsandberg/.conda/envs/opmd_movie/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/main.py", line 274, in get_particle
    data_list.append( self.data_reader.read_species_data(
  File "/ccs/home/rsandberg/.conda/envs/opmd_movie/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/data_reader.py", line 283, in read_species_data
    return io_reader.read_species_data(
  File "/ccs/home/rsandberg/.conda/envs/opmd_movie/lib/python3.10/site-packages/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/particle_reader.py", line 89, in read_species_data
    data += offset
ValueError: operands could not be broadcast together with shapes (49989,) (50000,) (49989,) 

I rebuilt with the latest branch as of today, 9/23/2022. The input file is attached inputs.txt

RTSandberg avatar Sep 23 '22 23:09 RTSandberg

Thanks for reporting this @RTSandberg.

I just tried the above script with a recent version of WarpX (git hash 6febc63b4d58) on ~~6~~ 4 GPUs and was not able to reproduce the bug. (In other words, the code runs fine and I am able to run the above Python post-processing code without any error.

~~Could you double check this issue by re-running it with the latest version of WarpX?~~

Nevermind, I was actually able to reproduce this bug when running on 6 GPUs.

RemiLehe avatar Sep 26 '22 17:09 RemiLehe

I took at closer look at this issue (by adding lots of Print statements!), and the results are perplexing:

  • It seems that openPMD does call resetDataset with size 50000 (not 49989) when it last touches iteration 20, based on adding the following Print statement
amrex::Print() << "SetupPos: size " << np << std::endl;

here: https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/WarpXOpenPMD.cpp#L985 and

amrex::Print() << "resetDataset " << comp << std::endl;

here: https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/WarpXOpenPMD.cpp#L992

  • It also seems that we do call storeChunk immediately afterwards.

RemiLehe avatar Sep 29 '22 00:09 RemiLehe

Let's try amrex::AllPrint() and see if all ranks reach the location of resetDataset and also call it with the same values.

ax3l avatar Sep 29 '22 00:09 ax3l

Thanks @ax3l ! I just checked: all MPI ranks are indeed calling resetDataset with the right size (50000). I am still confused as to why the array does not seem to have the right size when reading it.

RemiLehe avatar Sep 29 '22 16:09 RemiLehe

Note that this error still occurs even when turning load-balancing by commenting this line:

#algo.load_balance_intervals = 200

RemiLehe avatar Sep 29 '22 23:09 RemiLehe

It seems that when I do:

diag_lab.fields_to_plot = none

then the problem disappears.

However, I am a bit confused: why does turning the fields off result in a fix on the particle metadata?

RemiLehe avatar Oct 03 '22 17:10 RemiLehe

Thank you for this hint and test. I wonder if when we call Redistribute, the bounds set by the fields multifab is removing some particles. I will test this and report here

RevathiJambunathan avatar Oct 03 '22 17:10 RevathiJambunathan

@RevathiJambunathan That's a good suggestion, but it does not seem that there is any issue with the Redistribute: as described above, adding Print statements show that we are making the right openPMD calls with the right sizes. It seems that the problem is at a lower level...

RemiLehe avatar Oct 03 '22 18:10 RemiLehe

I have another input deck that sees this issue running on 1 node, 6 GPUs on Summit: ts = LpaDiagnostics('./diags/diag_btd/') ts.get_particle(species='beam',var_list=['w','x'],iteration=ts.iterations[1]) Output: [array([0.12483018, 0.12483018, 0.12483018, ..., 0.12483018, 0.12483018, 0.12483018]), array([ 6.60118937e-07, -3.75790632e-07, 6.18016430e-07, ..., 2.38995954e-07, -2.38995954e-07, -2.38995954e-07])] ts.get_particle(species='beam',var_list=['w','x'],iteration=ts.iteration[2]) Output: ...

File ~/src/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/particle_reader.py:89, in read_species_data(series, iteration, species_name, component_name, extensions)
     87 if component_name in ['x', 'y', 'z']:
     88     offset = get_data(series, species['positionOffset'][component_name])
---> 89     data += offset
     90 # - Return momentum in normalized units
     91 elif component_name in ['ux', 'uy', 'uz' ]:

ValueError: operands could not be broadcast together with shapes (49992,) (50000,) (49992,) 

inputs.txt

RTSandberg avatar Nov 02 '22 18:11 RTSandberg

On one of the production simulations, I just noticed that using:

<diag>.openpmd_backend = h5

allows to circumvent the problem.

@RTSandberg Are you able to reproduce this observation with the above input deck?

RemiLehe avatar Nov 14 '22 18:11 RemiLehe

Thanks Remi! @RTSandberg, as a work-around for ADIOS2, can you please try if using two diagnostics - one only particles and one only fields - works for ADIOS2 as well?

I'll take this on in the 2nd half of the week with the inputs set reported by Ryan: https://github.com/ECP-WarpX/WarpX/issues/3389#issuecomment-1301065526

Update: could not dig into this during the week - will pick up after my vacations.

ax3l avatar Nov 14 '22 18:11 ax3l

@ax3l Yes, it seems that the following work-around did work in my case. Thanks for the suggestions.

Here is the corresponding input script: inputs.txt

So in summary, there seems to be 2 potential work-arounds:

  • splitting the diagnostics into a fields-specific and a particle-specific diagnostic
  • switching to HDF5

RemiLehe avatar Nov 14 '22 19:11 RemiLehe

The HDF5 workaround works for me. At one time I thought I had a case where the split diagnostic workaround didn't work, but at the present it works on every case I have tried lately

RTSandberg avatar Nov 16 '22 20:11 RTSandberg

I haven't observed this issue when using HDF5. However, with ADIOS the split diagnostic workaround is not guaranteed; I still get incomplete particle metadata sometimes.

from openpmd_viewer.addons import LpaDiagnostics
btd = LpaDiagnostics('diags/diag_btd')
btd.get_particle(iteration=0, var_list=['id','x','y','z','ux','uy','uz'])

results in

ValueError: operands could not be broadcast together with shapes (49988,) (49992,) (49988,)

Here are the batch and input script for running on 1 node, 6 GPUs on Summit: summit_batch.txt inputs.txt

It is very interesting to note that this issue does not arise if the field BTD is removed, for example by changing the line diagnostics.diags_names = diag diag_btd fld_btd to diagnostics.diags_names = diag diag_btd.

I observed both behaviors on Perlmutter as well

RTSandberg avatar Nov 30 '22 20:11 RTSandberg

that is indeed quite weird! Thanks for the input file @RTSandberg . I can take a look at this

RevathiJambunathan avatar Nov 30 '22 22:11 RevathiJambunathan

I am running a even smaller version of the above inputs on my laptop: inputs.txt

cmake -S . -B build -DWarpX_PSATD=ON -DWarpX_DIMS=RZ
cmake --build build -j 3

This finishes after 512 steps and does, e.g. for lab station 1, not write all the meta-data for particles (e.g., charge & mass) properly, even though it does shut down cleanly.

That means WarpXOpenPMDPlot::SetConstParticleRecordsEDPIC is not properly called, which means the control variable is_last_flush_to_step is not properly tracked, which means isLastBTDFlush is not consequently passed to the IO backend when its time.

ax3l avatar Dec 16 '22 00:12 ax3l

ah I see. thanks. I can trace this with the input file u shared and see whats happening with isLastBTDFlush

RevathiJambunathan avatar Dec 16 '22 01:12 RevathiJambunathan

@ax3l I used your input file and ran this locally.

At 512th timestep, the snapshots are not full and hence isLastBTDFlush will not be set to 1 This is also indicated by the warning at the beginning of the output, which suggests that the snapshots will be full at 2836th timestep.

**** WARNINGS ******************************************************************
* GLOBAL warning list  after  [ FIRST STEP ]
*
* --> [!  ] [BTD] [raised twice]
*
*     Simulation might not run long enough to fill all BTD snapshots.
*     Final step: 512
*     Last BTD snapshot fills around step: 2836
*     @ Raised by: ALL
*
********************************************************************************

So I ran the simulation upto 2850 timesteps and I am able to visualize the data using openpmd-viewer image

RevathiJambunathan avatar Dec 16 '22 19:12 RevathiJambunathan

I think that's an orthogonal issue to fix - I think we should close out BTD steps (finalize particle meta-data & zero-out partly filled fields) on simulation end.

With our restart strategy, we would copy and keep the partly filled lab stations in checkpoints open (un-finalized) in case we want to continue.

Will continue focusing on the above issue for now posted by Ryan.

ax3l avatar Dec 19 '22 11:12 ax3l

https://github.com/ECP-WarpX/WarpX/issues/3389#issuecomment-1332719750

@RTSandberg I cannot reproduce the issue with the latest openPMD-viewer + openPMD-api and running on CPU. (Works w/o an error.) Will try again on GPU on Summit...

ax3l avatar Dec 19 '22 14:12 ax3l

Can reproduce on Summit :+1:

bpls -l diag_btd/openpmd_000000.bp
  uint64_t  /data/0/particles/beam/id          {49712} = 0 / 0
  double    /data/0/particles/beam/momentum/x  {49712} = 0 / 0
  double    /data/0/particles/beam/momentum/y  {49712} = 0 / 0
  double    /data/0/particles/beam/momentum/z  {49712} = 0 / 0
  double    /data/0/particles/beam/position/x  {49712} = 0 / 0
  double    /data/0/particles/beam/position/y  {49712} = 0 / 0
  double    /data/0/particles/beam/position/z  {49712} = 0 / 0
  double    /data/0/particles/beam/theta       {49712} = 0 / 0
  double    /data/0/particles/beam/weighting   {49712} = 0 / 0
bpls -A -l diag_btd/openpmd_000000.bp
  string    /basePath                                             attr   = "/data/%T/"
  uint8_t   /data/0/closed                                        attr   = 1
  double    /data/0/dt                                            attr   = 1
  uint32_t  /data/0/particles/beam/charge/macroWeighted           attr   = 0
  uint64_t  /data/0/particles/beam/charge/shape                   attr   = 49992
  float     /data/0/particles/beam/charge/timeOffset              attr   = 0
  double    /data/0/particles/beam/charge/unitDimension           attr   = {0, 0, 1, 1, 0, 0, 0}
  double    /data/0/particles/beam/charge/unitSI                  attr   = 1
  double    /data/0/particles/beam/charge/value                   attr   = -1.60218e-19
  double    /data/0/particles/beam/charge/weightingPower          attr   = 1
  string    /data/0/particles/beam/currentDeposition              attr   = "directMorseNielson"
  uint32_t  /data/0/particles/beam/id/macroWeighted               attr   = 0
  float     /data/0/particles/beam/id/timeOffset                  attr   = 0
  double    /data/0/particles/beam/id/unitDimension               attr   = {0, 0, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/id/unitSI                      attr   = 1
  double    /data/0/particles/beam/id/weightingPower              attr   = 0
  uint32_t  /data/0/particles/beam/mass/macroWeighted             attr   = 0
  uint64_t  /data/0/particles/beam/mass/shape                     attr   = 49992
  float     /data/0/particles/beam/mass/timeOffset                attr   = 0
  double    /data/0/particles/beam/mass/unitDimension             attr   = {0, 1, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/mass/unitSI                    attr   = 1
  double    /data/0/particles/beam/mass/value                     attr   = 9.10938e-31
  double    /data/0/particles/beam/mass/weightingPower            attr   = 1
  uint32_t  /data/0/particles/beam/momentum/macroWeighted         attr   = 0
  float     /data/0/particles/beam/momentum/timeOffset            attr   = 0
  double    /data/0/particles/beam/momentum/unitDimension         attr   = {1, 1, -1, 0, 0, 0, 0}
  double    /data/0/particles/beam/momentum/weightingPower        attr   = 1
  double    /data/0/particles/beam/momentum/x/unitSI              attr   = 1
  double    /data/0/particles/beam/momentum/y/unitSI              attr   = 1
  double    /data/0/particles/beam/momentum/z/unitSI              attr   = 1
  string    /data/0/particles/beam/particleInterpolation          attr   = "momentumConserving"
  string    /data/0/particles/beam/particlePush                   attr   = "Vay"
  double    /data/0/particles/beam/particleShape                  attr   = 3
  double    /data/0/particles/beam/particleShapes                 attr   = {3, 3}
  string    /data/0/particles/beam/particleSmoothing              attr   = "none"
  uint32_t  /data/0/particles/beam/position/macroWeighted         attr   = 0
  float     /data/0/particles/beam/position/timeOffset            attr   = 0
  double    /data/0/particles/beam/position/unitDimension         attr   = {1, 0, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/position/weightingPower        attr   = 0
  double    /data/0/particles/beam/position/x/unitSI              attr   = 1
  double    /data/0/particles/beam/position/y/unitSI              attr   = 1
  double    /data/0/particles/beam/position/z/unitSI              attr   = 1
  uint32_t  /data/0/particles/beam/positionOffset/macroWeighted   attr   = 0
  float     /data/0/particles/beam/positionOffset/timeOffset      attr   = 0
  double    /data/0/particles/beam/positionOffset/unitDimension   attr   = {1, 0, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/positionOffset/weightingPower  attr   = 0
  uint64_t  /data/0/particles/beam/positionOffset/x/shape         attr   = 49992
  double    /data/0/particles/beam/positionOffset/x/unitSI        attr   = 1
  double    /data/0/particles/beam/positionOffset/x/value         attr   = 0
  uint64_t  /data/0/particles/beam/positionOffset/y/shape         attr   = 49992
  double    /data/0/particles/beam/positionOffset/y/unitSI        attr   = 1
  double    /data/0/particles/beam/positionOffset/y/value         attr   = 0
  uint64_t  /data/0/particles/beam/positionOffset/z/shape         attr   = 49992
  double    /data/0/particles/beam/positionOffset/z/unitSI        attr   = 1
  double    /data/0/particles/beam/positionOffset/z/value         attr   = 0
  uint32_t  /data/0/particles/beam/theta/macroWeighted            attr   = 0
  float     /data/0/particles/beam/theta/timeOffset               attr   = 0
  double    /data/0/particles/beam/theta/unitDimension            attr   = {0, 0, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/theta/unitSI                   attr   = 1
  double    /data/0/particles/beam/theta/weightingPower           attr   = 0
  uint32_t  /data/0/particles/beam/weighting/macroWeighted        attr   = 1
  float     /data/0/particles/beam/weighting/timeOffset           attr   = 0
  double    /data/0/particles/beam/weighting/unitDimension        attr   = {0, 0, 0, 0, 0, 0, 0}
  double    /data/0/particles/beam/weighting/unitSI               attr   = 1
  double    /data/0/particles/beam/weighting/weightingPower       attr   = 1
  double    /data/0/time                                          attr   = 0
  double    /data/0/timeUnitSI                                    attr   = 1
  string    /date                                                 attr   = "2022-12-19 10:15:36 -0500"
  string    /iterationEncoding                                    attr   = "fileBased"
  string    /iterationFormat                                      attr   = "openpmd_%06T"
  string    /meshesPath                                           attr   = "fields/"
  string    /openPMD                                              attr   = "1.1.0"
  uint32_t  /openPMDextension                                     attr   = 1
  string    /particlesPath                                        attr   = "particles/"
  string    /software                                             attr   = "WarpX"
  string    /softwareVersion                                      attr   = "22.12-7-gc3eb6ea1efef"
  uint64_t  __openPMD_internal/openPMD2_adios2_schema             attr   = 0

ax3l avatar Dec 19 '22 15:12 ax3l

Since this shows up in the first lab station, we can simplify to:

...
diagnostics.diags_names = diag_btd fld_btd  # was: diag diag_btd fld_btd
...
diag_btd.num_snapshots_lab = 1  # was: 100
fld_btd.num_snapshots_lab = 1  # was: 100
...

and run faster: inputs.txt

The problem disappears if I remove fld_btd, so there is some side-effect going on here.

Seems important to run it from 6 MPI ranks (and/or with CUDA) to trigger with the inputs.

ax3l avatar Dec 19 '22 15:12 ax3l

I am generally able to use the particle BTD in RZ with ADIOS as the openPMD backend without issue as long as I don't also get field output, i.e. set fields_to_plot = none. I just found an instance where this configuration still leads to inconsistent numbers of macroparticles. It is difficult to reproduce and this is not a great example. I record it here just as evidence that the ADIOS + fields_to_plot = none workaround is not guaranteed to work

inputs.txt

RTSandberg avatar Jan 12 '23 21:01 RTSandberg

Met with @RemiLehe and @n01r to discuss the issue further. We realized that it is not the metadata that is necessarily inconsistent, but that the particle arrays themselves are corrupted. For example, if we run for ii, iteration in enumerate(ts_l.iterations): beam_ws, = ts_l.get_particle(iteration = iteration, var_list=['w'], species='beam') print(f'iteration {iteration}: # particles = {len(beam_ws)}'), we get the output iteration 0: # particles = 15239 iteration 1: # particles = 49478 iteration 2: # particles = 50000 iteration 3: # particles = 50000 iteration 4: # particles = 50000 iteration 5: # particles = 50000 iteration 6: # particles = 50000 iteration 7: # particles = 1375

input and batch scripts: batch.txt inputs.txt

RTSandberg avatar Jan 18 '23 00:01 RTSandberg

Yes, that is correct. I forgot to post this here and only mentioned it on Slack: In my debugging, for one corrupted BTD step I saw, the last resizing storeChunk calls for particle data are called but do not end up on disk.

That means, meta-data that is set in that last append of particles is correct but the variable (data) is too small.

ax3l avatar Jan 18 '23 18:01 ax3l

Since this shows up in the first lab station, we can simplify to:

...
diagnostics.diags_names = diag_btd fld_btd  # was: diag diag_btd fld_btd
...
diag_btd.num_snapshots_lab = 1  # was: 100
fld_btd.num_snapshots_lab = 1  # was: 100
...

and run faster: inputs.txt

The problem disappears if I remove fld_btd, so there is some side-effect going on here.

Seems important to run it from 6 MPI ranks (and/or with CUDA) to trigger with the inputs.

I used this input file, and I saw same num of particles is correct:

../bpls -l diag_btd/openpmd_000000.bp uint64_t /data/0/particles/beam/id {49992} = 1 / 50000 double /data/0/particles/beam/momentum/x {49992} = -1.17892e-19 / 1.17892e-19 double /data/0/particles/beam/momentum/y {49992} = -1.10237e-19 / 1.10237e-19 double /data/0/particles/beam/momentum/z {49992} = 4.15973e-19 / 6.32712e-19 double /data/0/particles/beam/position/x {49992} = -7.29641e-05 / 7.29641e-05 double /data/0/particles/beam/position/y {49992} = -8.36205e-05 / 8.36205e-05 double /data/0/particles/beam/position/z {49992} = -0.000179486 / -1.4209e-05 double /data/0/particles/beam/theta {49992} = -3.14159 / 3.14159 double /data/0/particles/beam/weighting {49992} = 0.12483 / 0.12483

../bpls -A -l diag_btd/openpmd_000000.bp |grep shape uint64_t /data/0/particles/beam/charge/shape attr = 49992 uint64_t /data/0/particles/beam/mass/shape attr = 49992 uint64_t /data/0/particles/beam/positionOffset/x/shape attr = 49992 uint64_t /data/0/particles/beam/positionOffset/y/shape attr = 49992 uint64_t /data/0/particles/beam/positionOffset/z/shape attr = 49992

guj avatar Jan 19 '23 18:01 guj

Testing a fix in #3657

ax3l avatar Jan 31 '23 22:01 ax3l

Idea for a fast test case of the issue (if we need it again):

  • simulation box filled to 1/3rd with particles: the lower 2/3rd has no particles
  • 2 MPI ranks, distributed along x
  • BTD diagnostics for particles

ax3l avatar Feb 01 '23 01:02 ax3l

Please note that ADIOS2 master now has support for Joined Arrays (in BP4 only for now), where you don't need to specify Shape on the writing side (nor Starts), and the reader will put all blocks into a Global Array. This is basically Local Arrays where the blocks are joined in one dimension together to form a Global Array, and is a good match for this use case. https://github.com/ornladios/ADIOS2/pull/3466

pnorbert avatar Feb 03 '23 17:02 pnorbert