ADIOS icon indicating copy to clipboard operation
ADIOS copied to clipboard

Bug writing empty data sets?

Open manauref opened this issue 4 years ago • 13 comments

We have an application (Gkeyll) which is outputting a scalar time trace to a single file during a time dependent simulation. We buffer a certain amount of data and periodically flush it out into an ADIOS file.

We've noticed that after restarting the simulation (because the first simulation ran out of wallclock time, for example) ADIOS writes some empty datasets to the file. This either makes postprocessing difficult or impossible (at present, in our workflow).

There's a description of this issue in the gkyl repo: https://github.com/ammarhakim/gkyl/issues/41

Is this something you are familiar with? is it an ADIOS bug, an issue with the file systems (e.g. NERSC's Cori), or an improper use on our side?

manauref avatar Apr 30 '21 20:04 manauref

Any ideas? I'd also be interested to hear if you think this issue is no longer present in ADIOS2

manauref avatar Jun 28 '21 17:06 manauref

So, reading that glkyl issue, it isn't really clear how we might try to reproduce the issue. Do you think this happens with appending an existing file? The issue sounds like it might happen without restart (at the beginning of a run). Are you using steps? ADIOS2 as more designed to optimize writing the same set of variables repeatedly, where it kind of looks like you are generating a new name for your variable for every step, which may trigger some issues.

eisenhauer avatar Jun 28 '21 18:06 eisenhauer

Can you please share such a dataset with us? At OLCF or NERSC if it is too big to share it otherwise.

Are you certain the application does not call Put() on a variable that has the defined shape {0,1}. Are you saying adios inserts these variables into the output without you asking for it?

pnorbert avatar Jun 29 '21 10:06 pnorbert

Yes @eisenhauer, it's hard to reproduce this. I've tried to do so on my laptop but have not been able to. But it continues to happen sometimes on clusters (it's happened in three clusters now, one at MIT, one at Princeton, and at NERSC's Cori).

It does seem to happen when appending to an existing file. I mean, I've never seen the first dataset we put in the file be empty or have a different size than we expected. It's always been the second or later datasets we append.

I don't think we are using steps. We are simply

  1. declaring a group (whose name changes according to the frame number)
  2. creating two variables, TimeMesh# and Data# (names changing with frame number, see files in gkyl issue)
  3. opening the file
  4. writing the data
  5. closing the file

manauref avatar Jun 29 '21 18:06 manauref

@pnorbert I'm attaching two files in which this happened:

I'm attaching two data files in which this happened.

  • gk30-wham1x2v_esEnergy.bp (generated at NERSC's Cori).
  • gk40-wham1x2v_esEnergy.bp (generated at Princeton's Stellar).

I don't think we are calling Put(). See the steps listed in the previous message to eisenhauer.

It is certainly placing datasets that shouldn't be there, sometimes. Specifically when it adds those empty TimeMesh and Data variables. The other problem is that sometimes it does add the data we meant to add but with a slightly different shape (rank 1 array vs. rank 2 array).

badDataShapes.zip

manauref avatar Jun 29 '21 18:06 manauref

For completeness here's the link to the section in the gkyl code in charge of writing these datasets. This layer of gkyl is written in Lua, but we've created an interface to the C Adios calls so you can probably make sense of it easily.

manauref avatar Jun 29 '21 18:06 manauref

@pnorbert if I recall correctly adios1.x files had the timers embedded (I don't remember if it was a hidden attribute or variable). Perhaps it's related. In adios2 we dump the json file for profiling, but that's a ON/OFF switch.

williamfgc avatar Jun 29 '21 18:06 williamfgc

For completeness here's the link to the section in the gkyl code in charge of writing these datasets. This layer of gkyl is written in Lua, but we've created an interface to the C Adios calls so you can probably make sense of it easily.

at https://github.com/ammarhakim/gkyl/blob/82ae19b5882e12a02056d109ae3d3a2eafbf6b1a/DataStruct/DynVector.lua#L342 what are those CSV strings?

My bet is that the self._data:size() and self._numComponents values become something strange after restart, or at some other times, and you define the variables differently. In the gk30-* file you attached, TimeNNN and DataNNN change shape to "scalar" at the same frame numbers. I don't remember how you can define scalars in adios1.x other than with "" for shape, but apparently sometimes you pass a string for local dimension that counts as scalar definition.

self._numComponents also changes to something strange (maybe 0?) sometimes. In gk30-* file these things happened around step 113 and 115 and lasted until 126

  double   TimeMesh112  {1} = null  / null  / null  / null
  double   Data112      {1, 1} = null  / null  / null  / null
  double   TimeMesh113  {0} = null  / null  / null  / null
  double   Data113      {0} = null  / null  / null  / null
  double   TimeMesh114  {0} = null  / null  / null  / null
  double   Data114      {0} = null  / null  / null  / null
  double   TimeMesh115  scalar
  double   Data115      scalar

pnorbert avatar Jun 30 '21 13:06 pnorbert

Forget my question about Put(). I did not realize this was adios 1.x.

The lua code shows me that you indeed call Adios.define_var() and Adios.write() so adios itself is not creating these variables from thin air.

pnorbert avatar Jun 30 '21 13:06 pnorbert

Using bpdump, I try to reconstruct what happened in the gk40-* run:

  • First run went up to frame 1532
  • You had a restart at frame 1501

Up to 1500, there is one Data entry (i.e. one Adios.write() call, a local array of size (1,1)).

Var (Group) [ID]: /Data1500 (DynVector1500esEnergy.bp) [2]
        Datatype: double
        Vars Characteristics: 1
        Offset(729069)  Payload Offset(729199)  File Index(-1)  Time Index(1)   Dims (l:g:o): (1,1)

Then between 1501 and 1532, there are two entries for each frame. For 1501 the local arrays have different sizes (1,1) + (2,1) hile for the rest they are the same (1,1) + (1,1)

Var (Group) [ID]: /Data1501 (DynVector1501esEnergy.bp) [2]
        Datatype: double
        Vars Characteristics: 2
        Offset(729520)  Payload Offset(729650)  File Index(-1)  Time Index(1)   Dims (l:g:o): (1,1)
        Offset(743960)  Payload Offset(744090)  File Index(-1)  Time Index(1)   Dims (l:g:o): (2,1)
...
Var (Group) [ID]: /Data1502 (DynVector1502esEnergy.bp) [2]
        Datatype: double
        Vars Characteristics: 2
        Offset(729971)  Payload Offset(730101)  File Index(-1)  Time Index(1)   Dims (l:g:o): (1,1)
        Offset(768446)  Payload Offset(768576)  File Index(-1)  Time Index(1)   Dims (l:g:o): (1,1)

At step 1533 everything goes back to "normal"

Do you agree with my theory? Is this what happened?

pnorbert avatar Jun 30 '21 13:06 pnorbert

A few comments on the IO:

  • opening and closing a file is a wasteful approach, slowing down the application for file system access (open) and then read and reconstruction of metadata of all previous timesteps. A step based approach is much better.
  • creating a new group at each frame seems to be completely useless but wasteful to me. I don't see a reason to do that.
  • update mode does not do what I think you assume it would. It does not simply "hide" the previous content but adds to it. Hence after the restart, the frames above restrart time, which were already written, will get another entry of data.
  • Why frames 1501-1532 are interpreted as "scalar" instead of two local arrays, I don't know, it's a bug in adios1.x BP3 format reader (which we did not know about). This should not be an issue anymore with ADIOS2 and it's BP4 format.

pnorbert avatar Jun 30 '21 13:06 pnorbert

@pnorbert @williamfgc thank you very much for your comments. I was able to get back to this problem and found a potential solution. First, I found a way to reproduce the appearance of "scalar" datasets. If I just call our code (gkyl) in a loop, with each iteration restarting from the previous one and taking just a couple of steps, then a "scalar" dataset appears after 30-60 iterations.

Just out of curiosity I decided to change our code for creating variables, in lines 364 and 366 of the DynVector.lua file. Previously we were passing an empty string (i.e. "") for global_dimensions because in page 41 of the ADIOS 1.13.1 manual it says to use an empty string for a local array, which is the case here because only rank 0 (of MPI_COMM_WORLD) should be executing this code. But if I pass the same string containing the array size to dimensions and global_dimensions (in our case localTmSz for the first define_var) then I no longer see the scalar datasets. Is this a bug in ADIOS 1? will this not occur in ADIOS 2 (we'd like to move to ADIOS 2 but haven't had the time)?

Some additional comments/questions:

  1. You mentioned opening and closing is wasteful, and that a step approach is better. I believe we went with this option because: a) if we left the file open and the simulation terminated abruptly then the file could corrupt, or b) we are writing irregular amounts of data and do not necessarily know a priori what the I/O pattern will be (it depends on the system's evolution, and in frame 10 we may write TimeMesh10 {5} and Data10 {5,1} but in Frame 11 we may write TimeMesh11 {23} and Data11 {23,1}) I'm not 100% sure about a), the decision might've been made before I arrived. But regarding b), can that irregular I/O pattern be accommodated in a step approach? can you point me to the part of the manual where this is explained?
  2. I agree we should not create a group every time. I have addressed that (in the branch linked above).
  3. I don't understand your comment about update mode. My best guess is that you are referring to the case in which the first simulation goes until a later time than the time from which we restart the second simulation, in which case the second simulation may try to update some frame but since the first simulation had already created it, it doesn't update it but simply adds to it. Is that right?

manauref avatar Aug 08 '21 21:08 manauref

@manauref. It is entirely fine to define and write a global array from one process. By definition, a subset of processes (that opened an output) can define and write a global array, including a single process. There is no difference between local and global arrays at writing other then having extra metadata of global dimension and offsets. The original problem is due to handling local arrays at read time when appending more local arrays to existing steps after a restart. If you can avoid that by having global arrays, please do so.

For the other points:

1.a. This is an old fear from other file formats that get corrupt on abort. adios1 bp version 3 is safe as long the app does not die during the write-out (call to advance_step() or close()). For that very rare case, there is the bprecover tool that trims the metadata so that only the intact steps remain. ADIOS2 BP4 is even better, you can still step through all intact steps before failing on the corrupt step.

1.b. Changing global dimensions over steps is allowed but it is harder to read it back. bpls shows __ for dimensions that are changing over steps, so you cannot use a simple selection to read data over all steps. However, I don't remember how to read them back in the user code. But knowing that you have one block per step, you can still read them back as local arrays using adios_inq_var_blockinfo and adios_selection_writeblock

  1. Yes, that is what I meant. There is a request for ADIOS2 to support restarting better by trimming an existing output to start clean with a given step. Once that is supported, ADIOS2 would be the better choice for your application.

pnorbert avatar Aug 09 '21 15:08 pnorbert