pyPESTO icon indicating copy to clipboard operation
pyPESTO copied to clipboard

Storage of intermediate results

Open elbaraim opened this issue 2 years ago • 14 comments

Feature description Allow pyPESTO to store intermediate results before the whole process is finished (e.g. optimization, sampling).

Motivation/Application This is important specially when working with more computationally demanding models, e.g. one may one to assess parameter uncertainty using a large number of samples, and due to time constraints (e.g. running on a server) the process can get killed at almost its finishing point and therefore losing all the samples generated in the meantime.

e.g. recently i got this painful message

99%|█████████▉| 992972/1000000 [167:59:43<49:23,  2.37it/s]slurmstepd: error: *** JOB 1338355 ON node43 CANCELLED AT 2021-08-19T11:59:07 DUE TO TIME LIMIT ***

of a process that took 7 days (and now all is lost) :(

This occurred in the context of sampling.

elbaraim avatar Aug 19 '21 13:08 elbaraim

And -- as a remark -- this is not an isolated case :(

EDIT: Maybe the group of persons involved in the storage development can have a look?

elbaraim avatar Aug 19 '21 13:08 elbaraim

Looks like something @PaulJonasJost could be up to?

jvanhoefer avatar Aug 19 '21 15:08 jvanhoefer

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.

A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

stephanmg avatar Aug 20 '21 06:08 stephanmg

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator.

A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

It should already be possible to store intermediate results for optimization using the objective history.

FFroehlich avatar Aug 20 '21 14:08 FFroehlich

trace_save_iter from class pypesto.HistoryOptions? (I think this is wrong, but maybe isn't.)

stephanmg avatar Aug 20 '21 15:08 stephanmg

trace_save_iter from class pypesto.HistoryOptions? (I think this is wrong, but maybe isn't.)

That attribute controls how frequently results are stored, but it needs to activated in the first place.

FFroehlich avatar Aug 20 '21 15:08 FFroehlich

@FFroehlich okay -> Concerning my related problem, I presume saving a results.h5 file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update my results.h5 file) isn't available, right? I hope I'm not getting this wrong.

stephanmg avatar Aug 20 '21 15:08 stephanmg

I second the suggestion by @elbaraim - Perhaps the right idea would be to modify the tqdm decorator (by adding a parameter for the write out interval) or implement another dedicated decorator. A related issue (or the same) is that in my multi-start optimizations I periodically need to save my .h5 results file (collecting the individual runs). If a job on some compute infrastructure runs into the wall-time limit, no .h5 results file is generated. It would be also good to have a periodic write-out of these updated .h5 files.

It should already be possible to store intermediate results for optimization using the objective history.

Yes, for optimization all should be possible already via the history class an optional trace_save_iter. Essentially, for optimization, we are only interested in single optimal values, which can easily be managed and extracted from that history object (except if the optimizer also evaluates points violating constraints). For sampling, this is different.

yannikschaelte avatar Aug 20 '21 17:08 yannikschaelte

@FFroehlich okay -> Concerning my related problem, I presume saving a results.h5 file collecting all already finished optimization runs (Let's say I'm doing 100 total runs and I want to periodically save/update my results.h5 file) isn't available, right? I hope I'm not getting this wrong.

Correct, see #517.

FFroehlich avatar Aug 20 '21 18:08 FFroehlich

@yannikschaelte so the following code should update my results.csv or results.h5 after the completion of each optimization run?

history_name = f"results_{date.today()}.csv" # or .h5
history_options = pypesto.HistoryOptions(trace_record=True, trace_save_iter=1, storage_file=history_name)

stephanmg avatar Aug 23 '21 11:08 stephanmg

As far as I know and tested, with the hdf5 history that should automatically happen, the only thing I would not be sure there is whether the interrupted run is saved nicely. But this is only for optimization, not sure how this is with sampling, will have a look.

PaulJonasJost avatar Aug 31 '21 07:08 PaulJonasJost

Hello @PaulJonasJost, yes the interrupted run might be in a "dirty" state, so the file isn't readable afterwards, which is okay (tested).

My concern is now the following: When I specify a .h5 file history (see above post) by adding the suffix .h5, then the output folder needs to exist - which was created automatically if one uses the CSV history by adding the suffix .csv. I assumed it would be handled the same way as when using the CSV history. Of course I can easily remedy this issue by creating the folders manually.

I am not sure which behaviour is expected, but I guess consistency across the both history writers might be desired?

stephanmg avatar Sep 01 '21 09:09 stephanmg

it does not create the directory? that is weird as I should do that automatically... (in pypesto.optimize.util line 36-41)

PaulJonasJost avatar Sep 01 '21 11:09 PaulJonasJost

Yes, it won't create the folder, I can share a screencast if required.

stephanmg avatar Sep 01 '21 12:09 stephanmg