sacred icon indicating copy to clipboard operation
sacred copied to clipboard

Resume experiment with sacred

Open nicoliKim opened this issue 7 years ago • 3 comments

Hi everyone, First thanks to the developers, sacred is a really awesome and powerful tool!!!

I have the following problem. I am using sacred just as a tool to store the args I pass to my scripts from the parser into the .json as a dictionary.

Now I have a model, let's say a Tensorflow/pytorch DNN, for example and I want to stop the training, and resuming it afterwords. I also have to specify that I am customising the output folder name giving it as an arg to the model instead of the default ID 1, 2, 3, etc... (see issue #330 )

The problem now is that when I stop the training and I try to resume it afterwards Sacred raise an error because it finds that the output folder with the given name already exists: FileExistsError: [Errno 17] File exists: 'out/out'.

Is there an easy way to overcome this issue in order to tell sacred that if the folder already exists, than saves the experiments settings and files in it?

Thanks in advance to whoever will help me in solving this simple yet annoying issue

nicoliKim avatar Jul 05 '18 14:07 nicoliKim

Hi,

thank you for the nice words! And your usecase makes a lot of sense and should definitely be supported in Sacred. Unfortunately there is currently no way of telling Sacred to resume an experiment. The MongoObserver has its own overwrite argument but it is quite limited. It would be much better to add a general resume-mode that tells each observer to overwrite/amend a previous entry. Such a --resume option is discussed in #291 (which is very related to your problem except that it focuses on the MongoObserver intstead of the FileStorageObserver). But that feature will require some thought and work. So as a short-term solution, the easiest for you would probably be to adapt the FileStorageObserver and add a custom overwrite mode similar to the one of the MongoObserver.

@JarnoRFB maybe you could comment on your progress if any?

Qwlouse avatar Jul 09 '18 22:07 Qwlouse

Unfortunately, there is no real progress from my side...

JarnoRFB avatar Jul 13 '18 10:07 JarnoRFB

Dear All, Sorry for my late reply.

I didn’t make progress as well.

Nonetheless, I found a bit tricky and very handcrafted way to overcome the issue.

It is nothing special, just few lines of code, but if you will be ever interested in it just let me know.

Best,

Kim

On Fri, 13 Jul 2018 at 12:27, Rüdiger Busche [email protected] wrote:

Unfortunately, there is no real progress from my side...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/IDSIA/sacred/issues/331#issuecomment-404793591, or mute the thread https://github.com/notifications/unsubscribe-auth/AjA7ZpbTO658L1inkdLum6RmoAGx3B5Kks5uGHYFgaJpZM4VECVr .

-- Kim Nicoli -------------------------------------------------------------- Contact No: +39 3489875018 *Ph.D. Student, Machine Learning * Department, TU Berlin *Technische Universität (TU), *Marchstraße 23, Berlin, Germany Department of Physics, University of Turin, Via Pietro Giuria 1, Torino, Italy --------------------------------------------------------------

nicoliKim avatar Jul 16 '18 11:07 nicoliKim