context-encoder icon indicating copy to clipboard operation
context-encoder copied to clipboard

How to resume training from a certain epoch value ?

Open maryam089 opened this issue 7 years ago • 7 comments

Hi ! Your work is great but I want to know that if i want to resume training from certain epoch then how to edit this in your training center inpaint code ?

maryam089 avatar Dec 25 '17 15:12 maryam089

When I try to load the pre-trained network and then want to update it with more training it gives me following error.......... any help @pathak22 ?? /home/maryam/torch/install/bin/lua: /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: check that you are sharing parameters and gradParameters stack traceback: [C]: in function 'assert' /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: in function 'getParameters' train.lua:270: in main chunk [C]: in function 'dofile' ...ryam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

maryam089 avatar Dec 29 '17 06:12 maryam089

@pathak22 kindly help me

maryam089 avatar Feb 17 '18 07:02 maryam089

@maryam089 Can you paste the full log? Where is this pre-trained network from? Was it trained from the context encoder training code? Are the architecture and other details same?

pathak22 avatar Feb 17 '18 07:02 pathak22

Well i am using your already trained network on imageNet 100k ..(center region inpaint).... but now i want to add few thousand more images to it and train it again ... how i can i load the weights and parameters to the network when i try to train it again for 1 or 2 more epoch on by loading already trained network @pathak22

maryam089 avatar Feb 17 '18 07:02 maryam089

@pathak22 if i stopped traing for any reason can i resume tge training from the epoch it stopped in as it always start from the beginning

NerminSalem avatar Feb 17 '18 07:02 NerminSalem

@NerminSalem @maryam089

Sorry I didn't provide this functionality in the training code (I should have!). But it should not be hard to implement if you look at this file and see how the network is first loaded. After this, the loaded network is same as the one defined here, and hence you won't need to define it again. Feel free to make a pull request if you would like. Thanks!

pathak22 avatar Feb 17 '18 08:02 pathak22

@pathak22 Hi, This is regrading re-training the imagenet/paris model that you have shared. We referred the below two links: https://github.com/torch/demos/blob/master/train-a-digit-classifier/train-on-mnist.lua https://github.com/facebook/fb.resnet.torch/issues/116
And we understand that there is a command line argument to indicate whether its re-training or training from beginning.
Do you have any such command line argument to be passed to indicate regarding retraining in your code? Can you please suggest me the code changes that could be done?

harshithabk avatar Apr 16 '18 07:04 harshithabk