google_inception_v3_for_caffe 18 categories of imagenet

trafficstars

Hello, could you provide the 18 categories's sysnet and data preparation scripts you used for training the 11704.caffemodel, thanks.

Jul 20 '16 08:07 shadowf126

I've used DIGITS functionality to prepare data for training. Just took the first 18 categories from train.txt and val.txt (sorry, don't have files anymore). I wrote simple python script to parse full train and val and extract these with id less equal than 18. Synset obtained from caffe using get_ilsvrc_aux.sh

zrzut ekranu 2016-07-22 o 14 56 44

Jul 22 '16 12:07 smichalowski

@smichalowski , I took the first 18 categories from train.txt and val.txt, and rerun the experiment but use bvlc/caffe, in 11704 iter, I got top1/0.790, top5/0.972. The difference is that I use batch_size=55, iter_size=1, I mean is the different implementation of batchnorm really means a lot ?

Jul 25 '16 13:07 shadowf126

The reason that I care about the different batchnorm implementation is I can hardly replicate the result noted in the paper deep residual learning for image recognition, however, when using torch, it's simple, and even better than the original paper claimed

Jul 25 '16 13:07 shadowf126

@shadowf126 I've tested a bunch of different settings in solver and train_val. Using vanilla caffe I wasn't able to achieve top-1 better than ~0.65. nvidia caffe worked a lot better.

Jul 25 '16 14:07 smichalowski

My train and solver is a little different, but the network structure is the same

solver.prototxt

train_net: "models/classifier/ilsvrc12/InceptionV3/299x299/18/train.prototxt" test_net: "models/classifier/ilsvrc12/InceptionV3/299x299/18/test.prototxt" test_iter: 180 test_interval: 1064 base_lr: 0.045 display: 10 max_iter: 21280 lr_policy: "step" gamma: 0.96 power: 0.75 momentum: 0.9 weight_decay: 0.0004 stepsize: 213 snapshot: 1064 snapshot_prefix: "models/classifier/ilsvrc12/InceptionV3/299x299/18/InceptionV3_ilsvrc12_299x299" solver_mode: GPU device_id: 0 test_compute_loss: true debug_info: false snapshot_after_train: true delta: 0.9 test_initialization: false average_loss: 10 clip_gradients: 80.0 iter_size: 1 type: "SGD"

train.prototxt is too long, but the conv+bn is almost the same

layer { name: "conv" type: "Convolution" bottom: "data" top: "conv" param { lr_mult: 1 decay_mult: 1 } convolution_param { num_output: 32 bias_term: false pad: 0 kernel_size: 3 stride: 2 weight_filler { type: "gaussian" std: 0.01 } } } layer { name: "conv_bn" type: "BatchNorm" bottom: "conv" top: "conv" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } batch_norm_param { eps: 0.001 } } layer { name: "conv_bias" type: "Bias" bottom: "conv" top: "conv" param { lr_mult: 1 decay_mult: 0 } bias_param { filler { type: "constant" value: 0.0 } } }

Jul 27 '16 07:07 shadowf126

google_inception_v3_for_caffe google_inception_v3_for_caffe copied to clipboard

18 categories of imagenet

google_inception_v3_for_caffe
google_inception_v3_for_caffe copied to clipboard