ShapePFCN icon indicating copy to clipboard operation
ShapePFCN copied to clipboard

Expected Training Time

Open vhanded opened this issue 7 years ago • 4 comments

Hi Kalov, thanks for this wonderful work, and I just spent hours setting up all the dependencies to run your code, and currently in training mode for psbAirplane1.

It is currently in iteration 1700, LR=0.001, loss = 0.0786143, and been running for 10 hours on 8GB GPU.

I am pretty new to machine learning, and still trying to digest all these terms. Can I know how much iteration is expected for that sample?

Thanks.

vhanded avatar May 07 '18 00:05 vhanded

I recommend you to use the predefined number of epochs. The estimated number of iterations (based on epochs and dataset) should appear in the output log.

kalo-ai avatar May 07 '18 00:05 kalo-ai

I found the file at /tmp/mvfcn.INFO

Here are the top info: Log file created at: 2018/05/06 09:54:41 Running on machine: HOSTNAME Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0506 09:54:41.141142 24246 solver.cpp:48] Initializing solver from parameters: train_net: "psbAirplane1/MVFCN_Learning_Metadata/frontend_vgg_train_net.txt" test_net: "psbAirplane1/MVFCN_Learning_Metadata/frontend_vgg_test_net.txt" test_iter: 1028 test_interval: 1000 base_lr: 0.001 display: 10 max_iter: 9646 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.001 stepsize: 100000 snapshot: 1000 snapshot_prefix: "psbAirplane1/MVFCN_Learning_Metadata/frontend_vgg_model" solver_mode: GPU device_id: 0 regularization_type: "L2" iter_size: 8 snapshot_format: HDF5 type: "Nesterov"

After 24 hours, current iter is 4000, so I expect the remaining time is another 30 hours.

Thanks.

vhanded avatar May 07 '18 13:05 vhanded

Hi Kalov, the first training process is completed after 50 hours ++, and it shows a list of tested logs:

 + Tested image 193/204: model70_int_000018_000999_000000_000000@ [accuracy = 71.1881] (view id: 0, ground truth seg. image: model70_lbl_000018_000999_000000_000000@)
  + Tested image 194/204: model70_int_000006_000006_000002_000003@ [accuracy = 94.9348] (view id: 2, ground truth seg. image: model70_lbl_000006_000006_000002_000003@)
  + Tested image 195/204: model70_int_000016_000016_000002_000001@ [accuracy = 97.0964] (view id: 2, ground truth seg. image: model70_lbl_000016_000016_000002_000001@)
  + Tested image 196/204: model70_int_000010_000010_000002_000003@ [accuracy = 92.1914] (view id: 2, ground truth seg. image: model70_lbl_000010_000010_000002_000003@)
  + Tested image 197/204: model70_int_000000_000000_000002_000000@ [accuracy = 94.7188] (view id: 2, ground truth seg. image: model70_lbl_000000_000000_000002_000000@)
  + Tested image 198/204: model70_int_000016_000016_000002_000000@ [accuracy = 97.1524] (view id: 2, ground truth seg. image: model70_lbl_000016_000016_000002_000000@)
  + Tested image 199/204: model70_int_000007_000007_000002_000001@ [accuracy = 93.8485] (view id: 2, ground truth seg. image: model70_lbl_000007_000007_000002_000001@)
  + Tested image 200/204: model70_int_000001_000391_000000_000003@ [accuracy = 97.9188] (view id: 0, ground truth seg. image: model70_lbl_000001_000391_000000_000003@)
  + Tested image 201/204: model70_int_000003_000003_000002_000002@ [accuracy = 94.1133] (view id: 2, ground truth seg. image: model70_lbl_000003_000003_000002_000002@)
  + Tested image 202/204: model70_int_000027_000551_000001_000003@ [accuracy = 93.543] (view id: 1, ground truth seg. image: model70_lbl_000027_000551_000001_000003@)
  + Tested image 203/204: model70_int_000021_000299_000001_000003@ [accuracy = 97.1831] (view id: 1, ground truth seg. image: model70_lbl_000021_000299_000001_000003@)
  + Tested image 204/204: model70_int_000008_000008_000002_000000@ [accuracy = 90.5587] (view id: 2, ground truth seg. image: model70_lbl_000008_000008_000002_000000@)
=> Mean image accuracy for mesh 20/20: psbAirplane1/model70.obj [accuracy = 92.6206]
Mean image accuracy in validation set for camera orbit  0: 91.6357
Mean image accuracy in validation set for camera orbit  1: 94.1654
Mean image accuracy in validation set for camera orbit  2: 94.5154

Mean accuracy for all images: 96.638

Then all the snapshots are deleted:

Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_4000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_2000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_9000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_7000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_6000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_9000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_8000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_9646.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_6000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_7000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_150.hdf5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_1000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_9646.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_5000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_2000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_1000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_5000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_4000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_3000.caffemodel.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_3000.solverstate.h5
Deleting snapshot: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model_iter_8000.solverstate.h5

And the training process restart again:

***** FCN PRE-TRAINING STARTS HERE *****
Searching for rendered images from mesh 1/20: psbAirplane1/model72.obj...
Found 204 images.
Searching for rendered images from mesh 2/20: psbAirplane1/model77.obj...
Found 204 images.
Searching for rendered images from mesh 3/20: psbAirplane1/model61.obj...
Found 204 images.
Searching for rendered images from mesh 4/20: psbAirplane1/model64.obj...
Found 208 images.
Searching for rendered images from mesh 5/20: psbAirplane1/model76.obj...
Found 208 images.
Searching for rendered images from mesh 6/20: psbAirplane1/model74.obj...
Found 204 images.
Searching for rendered images from mesh 7/20: psbAirplane1/model79.obj...
Found 208 images.
Searching for rendered images from mesh 8/20: psbAirplane1/model63.obj...
Found 204 images.
Searching for rendered images from mesh 9/20: psbAirplane1/model71.obj...
Found 204 images.
Searching for rendered images from mesh 10/20: psbAirplane1/model65.obj...
Found 208 images.
Searching for rendered images from mesh 11/20: psbAirplane1/model67.obj...
Found 204 images.
Searching for rendered images from mesh 12/20: psbAirplane1/model62.obj...
Found 212 images.
Searching for rendered images from mesh 13/20: psbAirplane1/model69.obj...
Found 204 images.
Searching for rendered images from mesh 14/20: psbAirplane1/model78.obj...
Found 200 images.
Searching for rendered images from mesh 15/20: psbAirplane1/model75.obj...
Found 204 images.
Searching for rendered images from mesh 16/20: psbAirplane1/model66.obj...
Found 212 images.
Searching for rendered images from mesh 17/20: psbAirplane1/model68.obj...
Found 208 images.
Searching for rendered images from mesh 18/20: psbAirplane1/model80.obj...
Found 204 images.
Searching for rendered images from mesh 19/20: psbAirplane1/model73.obj...
Found 208 images.
Searching for rendered images from mesh 20/20: psbAirplane1/model70.obj...
Found 204 images.
Will use GPUs: 0
GPU 0: Quadro M4000
I0508 19:17:23.778117 24246 solver.cpp:48] Initializing solver from parameters: 
train_net: "psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_train_net.txt"
test_net: "psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_test_net.txt"
test_iter: 1028
test_interval: 1000
base_lr: 0.001
display: 10
max_iter: 9000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.001
stepsize: 100000
snapshot: 1000
snapshot_prefix: "psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_model"
solver_mode: GPU
device_id: 0
regularization_type: "L2"
iter_size: 8
snapshot_format: HDF5
type: "Nesterov"
I0508 19:17:23.778314 24246 solver.cpp:83] Creating training net from train_net file: psbAirplane1/_MVFCN_Learning_Metadata_/frontend_vgg_train_net.txt
I0508 19:17:23.778959 24246 net.cpp:49] Initializing net from parameters: 
state {
  phase: TRAIN
}

This time with only 9000 iterations.

Is this expected behavior? And why?

Thanks.

vhanded avatar May 09 '18 06:05 vhanded

yes it is. In the first step, the method estimates the accuracy in a hold-out validation set (extracted randomly from the initial training set) every 1000 iterations up to the max specified number of iterations. Then in the end of the first step, it finds the iteration K that had the highest validation accuracy (due to over-fitting, K might not be the last iteration). Then it re-trains the method on the whole training set up to K iterations.

This is more like an engineering (unusual) choice that seems to give a tiny boost of something like <1% (i.e., it seems only useful for small datasets of ~20 shapes, like PSB, where over-fitting is an issue) at the expense of much slower training. Obviously K is not necessarily an optimal choice.

If you want, you can just deactivate the second step (edit the code), store all snapshots, and simply select the one at K iterations from the first step, and you are done.

kalo-ai avatar May 10 '18 00:05 kalo-ai