chainer-fast-neuralstyle
chainer-fast-neuralstyle copied to clipboard
.State and .Model
Hi when training I set Checkpoint to save every 1000 it writes two files
.State and .Model
Can i use these to test? Which file do I use
also if training crashes can I restart training from one of these checkpoints?
Yes, the .state file contains the optimizer state and is only needed for resuming the training, after cancellation, or successful completion if wish to train more.
It is not required for generation, use only the model to test.
Note: reasonable results usually show up only after 40000 iterations. You will likely get very noisy images earlier. I prefer to set checkpoints not more often than 10000 iterations.
Ah thanks for the info. How do I resume training using the .state file?
From: Bogdan Boyko [email protected] Sent: Friday, April 14, 2017 1:25:01 PM To: 6o6o/chainer-fast-neuralstyle Cc: Sugarbank; Author Subject: Re: [6o6o/chainer-fast-neuralstyle] .State and .Model (#7)
Yes, the .state file contains the optimizer state and is only needed for resuming the training, after cancellation, or successful completion if wish to train more.
It is not required for generation, use only the model to test.
Note: reasonable results usually show up only after 40000 iterations. You will likely get very noisy images earlier. I prefer to set checkpoints not more often than 10000 iterations.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/6o6o/chainer-fast-neuralstyle/issues/7#issuecomment-294077002, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKMmrO8qmO3_Wxg9qABb_TBDVKEsKJ6yks5rvucNgaJpZM4M9aPw.
Pass the model snapshot and optimizer state with -i and -r options respectively. Something like this:
python train.py \
-s <style_image_path> \
-d <training_dataset_path> \
-i <input_model_path> \
-r <optimizer_state_path> \
-c 10000 -g 0
thanks a lot I have setoff a retrain pointing to the model and state however it seems it is starting at the beginning again?
ubuntu@ip-172-31-19-95:~/chainer-fast-neuralstyle$ python train.py -s /home/ubuntu/chainer-fast-neuralstyle/sample_images/Sketch.jpg -d /home/ubuntu/train2014 -i /home/ubuntu/chainer-fast-neuralstyle/models/Sketch_0_30000.model -r /home/ubuntu/chainer-fast-neuralstyle/models/Sketch_0_30000.state -c 10000 -g 0 num traning images: 82783 82783 iterations, 2 epochs load model from /home/ubuntu/chainer-fast-neuralstyle/models/Sketch_0_30000.model load optimizer state from /home/ubuntu/chainer-fast-neuralstyle/models/Sketch_0_30000.state epoch 0 (epoch 0) batch 0/82783... training loss is...272908.28125 (epoch 0) batch 1/82783... training loss is...122521.828125 (epoch 0) batch 2/82783... training loss is...109763.117188 (epoch 0) batch 3/82783... training loss is...109465.929688 (epoch 0) batch 4/82783... training loss is...141037.921875 (epoch 0) batch 5/82783... training loss is...97152.671875 (epoch 0) batch 6/82783... training loss is...130711.234375 (epoch 0) batch 7/82783... training loss is...144749.28125 (epoch 0) batch 8/82783... training loss is...202026.890625 (epoch 0) batch 9/82783... training loss is...124754.890625 (epoch 0) batch 10/82783... training loss is...180410.890625 (epoch 0) batch 11/82783... training loss is...140089.125 (epoch 0) batch 12/82783... training loss is...204231.734375 (epoch 0) batch 13/82783... training loss is...124666.90625 (epoch 0) batch 14/82783... training loss is...226574.03125 (epoch 0) batch 15/82783... training loss is...116684.5625 (epoch 0) batch 16/82783... training loss is...131869.203125 (epoch 0) batch 17/82783... training loss is...162770.234375 (epoch 0) batch 18/82783... training loss is...281715.09375 (epoch 0) batch 19/82783... training loss is...121658.664062
No, it's continuing from the point you left off. Indeed, it doesn't track any progress, so it might appear as if it starts from the beginning. But note the loss levels. They stay at same low amounts as at the end of the previous session. The newly initialized model will result in much greater values.
One point to consider here, if you didn't finish the epoch, when you restart, it will reuse the same images in the beginning. You might manually skip the first 30k images by adjusting the script, or preferably resume after the n-th epoch is done.