fast-rcnn
fast-rcnn copied to clipboard
How to use new snapshotting?
fast-rcnn doesn't take as an argument --snapshot so I'm not sure how to use a snapshot.
I'm asking because in the /models/VGG16/solver.prototxt it has this : "We disable standard caffe solver snapshotting and implement our own snapshot"
Thanks
It's in the ./lib/fast-rcnn/config.py
In that file I can change the time between snapshots and snapshot infix but nothing on using the snapshot during training.
Would I just change in the solver.prototxt the snapshot number to reference the current snapshot?
@xksteven I guess you would like to do the validation during the training? I am not sure whether that's supported by the current fast-rcnn edition, as all the forward job is started from the python part and I don't think we have a testing function during the training for now. I am afraid in that way you might need to revise the code yourself.
@WilsonWangTHU You know when using caffe you can provide the snapshot option such as -snapshot=model_iter_xxx.solverstate to restart the training from that point? Normally in caffe the solverstate and the caffemodel saved as model_iter_xxx.caffemodel are both in the same directory but with fast-rcnn I only see the caffemodel saved in the output/default/imdb_trainval. I'd like to be able to restart the training using those weights stored there.
I'm running it on a cluster with a certain time limit and it will kill my process at certain time intervals. I just want to be able to restart the training from that snapshot.
I have the same problem.
How to restart the training from a snapshot? Can anyone provide some tips? Thanks.
@kyuusaku @xksteven I have met the same problem, do you guys get some effective solutions?Thanks
Make the following modifications and you will be able to use the --snapshot argument
In tools/train_net.py
def parse_args(): """ Parse input arguments """ parser = argparse.ArgumentParser(description='Train a Fast R-CNN network') parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]', default=0, type=int) parser.add_argument('--solver', dest='solver', help='solver prototxt', default=None, type=str) parser.add_argument('--iters', dest='max_iters', help='number of iterations to train', default=40000, type=int) parser.add_argument('--weights', dest='pretrained_model', help='initialize with pretrained model weights', default=None, type=str) parser.add_argument('--snapshot', dest='previous_state', help='initialize with previous state', default=None, type=str) parser.add_argument('--cfg', dest='cfg_file', help='optional config file', default=None, type=str) parser.add_argument('--imdb', dest='imdb_name', help='dataset to train on', default='voc_2007_trainval', type=str) parser.add_argument('--rand', dest='randomize', help='randomize (do not use a fixed seed)', action='store_true') parser.add_argument('--set', dest='set_cfgs', help='set config keys', default=None, nargs=argparse.REMAINDER)
In lib/fast_rcnn/train.py
class SolverWrapper(object): """A simple wrapper around Caffe's solver. This wrapper gives us control over he snapshotting process, which we use to unnormalize the learned bounding-box regression weights. """ def __init__(self, solver_prototxt, roidb, output_dir, pretrained_model=None, previous_state=None): """Initialize the SolverWrapper.""" self.output_dir = output_dir print 'Computing bounding-box regression targets...' self.bbox_means, self.bbox_stds = \ rdl_roidb.add_bbox_regression_targets(roidb) print 'done' self.solver = caffe.SGDSolver(solver_prototxt) if pretrained_model is not None: print ('Loading pretrained model ' 'weights from {:s}').format(pretrained_model) self.solver.net.copy_from(pretrained_model) elif previous_state is not None: print ('Restoring State from ' ' from {:s}').format(previous_state) self.solver.restore(previous_state) self.solver_param = caffe_pb2.SolverParameter() with open(solver_prototxt, 'rt') as f: pb2.text_format.Merge(f.read(), self.solver_param) self.solver.net.layers[0].set_roidb(roidb) . . . def train_net(solver_prototxt, roidb, output_dir, pretrained_model=None, max_iters=40000,previous_state=None): """Train a Fast R-CNN network.""" sw = SolverWrapper(solver_prototxt, roidb, output_dir, pretrained_model=pretrained_model,previous_state=previous_state) print 'Solving...' sw.train_model(max_iters) print 'done solving'
Thanks for the code but how to save the solverstate during fast r-cnn training? It looks like the method Solver::SnapshotSolverState isn't exported to pycaffe...
Did you change "snapshot: 0" to "snapshot: 10000" in your solver.prototxt? That allows you to save the state at iteration 10000 for example.
Ah, thanks! Didn't think of that...
@lynetcha, one more modification:
In tools/train_net.py
output_dir = get_output_dir(imdb)
print 'Output will be saved to `{:s}`'.format(output_dir)
train_net(args.solver, roidb, output_dir,
pretrained_model=args.pretrained_model,
max_iters=args.max_iters, **previous_state=args.previous_state**)
also remember to omit --weights param
hi @po0ya
what if I don't save the extra file for the last layer weights? would be bad mAP after retraining?
Hello @twmht
Basically it'll mess up the whole network if you want to continue training. The network is trained to work for zero mean and unit variance bboxes. For test time convenience, the weights and bias of the last layer is scaled by the std and shifted by the mean. If it has not been done, the prediction should've been scaled and shifted manually. It's for convenience in testing time, but the weights are not the ones that were learned by backprop, so retraining with these weights would be meaningless for the network.
EDIT: Add these couple of lines to the end of SolverWrapper constructor init
found = False
for k in net.params.keys():
if 'bbox_pred' in k:
bbox_pred = k
found = True
print('[#] Renormalizing the final layers back')
net.params[bbox_pred][0].data[4:, :] = \
(net.params[bbox_pred][0].data[4:, :] *
1.0 / self.bbox_stds[4:, np.newaxis])
net.params[bbox_pred][1].data[4:] = \
(net.params[bbox_pred][1].data - self.bbox_means)[4:] * 1.0 / self.bbox_stds[4:]
if not found:
print('Warning layer \"bbox_pred\" not found')
@po0ya but aren't the weights (*.caffemodel) that are saved by the default solver already normalized (because they were never unnormalizied, because the caffemodel was not saved using provided snapshot functionality). So I guess the produced *.solverstate is linked to the *.caffemodel model that was not produced by the faster rcnn snapshot function. Using resuming functionality you get 2 versions of caffemodel, the one provided by the default solver snapshot and the one provided by the snapshot function in faster r-cnn that the weights are unnormalized before saving. So I guess that normalization is not needed.
Net params in snapshot function in SolverWrapper is first unnormalized, saved and restored with normalized version. So the param version is up to when the snapshot in Caffe is called.
I didn't dig the code of Caffe, but I think disabling snapshot in solver.prototxt
and manually calling solver.snapshot()
will be better to control exactly which version is snapshotted.
Actually, I look into the log and found that the Caffe snapshot is called before snapshot in SolverWrapper
. diff
the params file shows that Caffe snapshot indeed save a different (normalized) version than SolverWrapper
. Manually invocation of solver.snapshop
obtained a identical .caffemodel
.
So we can resume the .solverstate
safely without unnormalizing the parameters with Caffe snapshot. But this produces two version of '.caffemodel's. It's up to you to snapshot which version of parameters.