artiq icon indicating copy to clipboard operation
artiq copied to clipboard

Save to hdf5 before experiment is complete

Open dleibrandt opened this issue 8 years ago • 6 comments

As far as I can tell, the hdf5 files are not written until the end of an experiment. Is this correct?

If not, how do I save to hdf5 before the experiment is complete?

If so, I see this as a significant problem. We don't want to lose all the data if our computer crashes in the middle of a run. Also, we frequently run analysis scripts in the middle of experiment runs, which might be matlab or something else that doesn't play nice with artiq.

dleibrandt avatar Jul 11 '16 23:07 dleibrandt

Correct.

You are free to open and write files to do checkpointing whenever you want. The code that does the hdf5 file writing is here: https://github.com/m-labs/artiq/blob/master/artiq/master/worker_impl.py#L230 You should be able to open that same file and write to it before write-results happens. IIRC the scheduler device knows about rid. Be aware that, if you want to read from another process with the hdf5 file being written to, single-writer-multiple-reader requires a bit of special care in h5py. If you need automatic checkpointing in the background we'd need a full specification.

jordens avatar Jul 12 '16 06:07 jordens

http://docs.h5py.org/en/latest/swmr.html

jordens avatar Jul 12 '16 06:07 jordens

If you need automatic checkpointing in the background we'd need a full specification.

May not be fully in the background. An API such as Experiment.write_results() that is called automatically after Experiment.analyze() and may be called explicitly at any time by the user may be a good choice.

sbourdeauducq avatar Jul 14 '16 02:07 sbourdeauducq

Yes. Or open the results file early (in prepare()) and expose the handle so that partial or intermediate things can be written without having to push all datasets.

jordens avatar Jul 14 '16 12:07 jordens

I think ideally I would like:

  1. All datasets are automatically written to the hdf5 near the beginning of the experiment (maybe at the end of prepare or the start of run?)
  2. Periodically, datasets that have been modified can be re-written via some user called function Experiment.update_hdf5() (ideally only the parts of the datasets that have actually been modified would be written for speed reasons)
  3. Experiment.update_hdf5() would be called automatically at the end of analyze()

If number 2 is too hard to implement, it might be OK if the user has the option to either write all the datasets or specify which datasets to write.

dleibrandt avatar Jul 14 '16 17:07 dleibrandt

#1464 should improve the situation considerably by always saving to HDF5 once the run stage is reached, i.e. even if it is finished with an exception. If the user crashes/deadlocks the worker process before the exception handler runs, data can of course still be lost, so further checkpointing might still be a good idea.

dnadlinger avatar Jun 18 '20 18:06 dnadlinger