HighFive icon indicating copy to clipboard operation
HighFive copied to clipboard

Overwriting a dataSet using createDataSet

Open emanmoba opened this issue 2 years ago • 6 comments

I am using HighFive to write the output and restart files of my simulations. For steady-state problems, the intermediate solutions are written at some given time intervals so that the simulation can be resumed if there are any interruptions during the run. However, I want to keep only one intermediate solution at a time and get rid of the older ones. There might be already a similar solution to this problem that I have failed to find but in case there isn't, it would be great if the function createDataSet can take an additional boolean to overwrite a specific dataset if it already exists without touching other dataSets.

emanmoba avatar Apr 06 '22 16:04 emanmoba

It is not implemented yet, we should design an API for that.

If you want to give a try, feel free, we can help you once you make a PR.

alkino avatar Apr 13 '22 11:04 alkino

I think that H5Easy can already do this. Note though, unless that dataset was allocated extendable the data has to have the exact shape shape.

https://github.com/BlueBrain/HighFive/blob/417e4ff003dfe35f22c2352bce9d53a4fcac99ca/include/highfive/H5Easy.hpp#L76-L79


If you are considering a PR: It would make sense to consider the same API from createDataSet.

tdegeus avatar Apr 27 '22 12:04 tdegeus

Note though, unless that dataset was allocated extendable the data has to have the exact shape shape.

This is smart and should be kept for any implementation. Otherwise, I'd be concerned about growing files unintentionally.

matz-e avatar Apr 27 '22 12:04 matz-e

I think that there is even no otherwise ;). HDF5 cannot delete data as it cannot reorder like your memory can. So

  • Overwriting the exact bits is fine.
  • Datasets allocated as chunked can grow (not shrink, or at least not really, it will not release bits, but might reuse them if you re-expand).

So if you want to overwrite an arbitrary dataset you'd have to release the link between the name and the data (making them deadweight) and adding new data. Note that if that happens you have to repack (i.e. write a new HDF5 file) to get rid of the deadweight.

tdegeus avatar Apr 27 '22 12:04 tdegeus

Should also match the datatype, no?

matz-e avatar Apr 27 '22 12:04 matz-e

Yes!

tdegeus avatar Apr 27 '22 12:04 tdegeus