kaggle_statefarm icon indicating copy to clipboard operation
kaggle_statefarm copied to clipboard

Incorrect shape error

Open opraveen opened this issue 8 years ago • 3 comments

I created the train and val .rec files, and when I run the training script with Inception-BN model, I notice this incorrect shape error:

$ ./run.cv_inception_bn.sh 2016-07-17 01:02:57,342 Node[0] start with arguments Namespace(batch_size=32, clip_gradient=5.0, data_dir='./', data_shape=224, dataset='ft', finetune_from='model/Inception_BN-0039', finetune_lr_scale=10, gpus='0', kv_store='local', load_epoch=None, log_dir='./tmp/', log_file=None, lr=0.001, lr_factor=1, lr_factor_epoch=1, model_prefix='./model/ckpt-shuffle1', network='inception-bn', num_classes=10, num_epochs=30, num_examples=216, train_dataset='sf1_train.rec', val_dataset='sf1_val.rec') 2016-07-17 01:02:57,342 Node[0] finetune from model/Inception_BN at epoch 39 [01:02:57] src/io/iter_image_recordio.cc:211: ImageRecordIOParser: ./sf1_train.rec, use 1 threads for decoding.. [01:02:57] src/io/./iter_normalize.h:218: Cannot find mean.bin: create mean image, this will take some time... [01:03:11] src/io/./iter_normalize.h:231: 10000 images processed, 13.6055 sec elapsed [01:03:21] src/io/./iter_normalize.h:231: 20000 images processed, 23.6031 sec elapsed [01:03:21] src/io/./iter_normalize.h:244: Save mean image to mean.bin.. [01:03:22] src/io/iter_image_recordio.cc:211: ImageRecordIOParser: ./sf1_val.rec, use 1 threads for decoding.. [01:03:22] src/io/./iter_normalize.h:103: Load mean image from mean.bin 2016-07-17 01:03:24,226 Node[0] lr_scale: {'fc1_ft_weight': 10, 'softmax_label': 10, 'fc1_ft_bias': 10} [01:03:24] ../mxnet/dmlc-core/include/dmlc/logging.h:235: [01:03:24] src/operator/./concat-inl.h:152: Check failed: (dshape[j]) == (tmp[j]) Incorrect shape[2]: (32,320,13,13). (first input shape: (32,576,14,14)) Traceback (most recent call last): File "train_inception_bn.py", line 92, in train_model.fit(args, net, get_iterator) File " ../kaggle_statefarm/inception/train_model.py", line 119, in fit epoch_end_callback = checkpoint) File "../mxnet/python/mxnet/model.py", line 746, in fit self._init_params(dict(data.provide_data+data.provide_label)) File "../mxnet/python/mxnet/model.py", line 486, in _init_params arg_shapes, _, aux_shapes = self.symbol.infer_shape(*_input_shapes) File "../mxnet/python/mxnet/symbol.py", line 453, in infer_shape return self._infer_shape_impl(False, *args, *_kwargs) File "../mxnet/python/mxnet/symbol.py", line 513, in _infer_shape_impl ctypes.byref(complete))) File "../mxnet/python/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError()))

opraveen avatar Jul 17 '16 08:07 opraveen

seems like you used different shape for input rather than 224 as required for inception BN. VGG and inception BN use different input shapes as mentioned in the kaggle forum posts, so please not re-use VGG input to inception BN model.

phunterlau avatar Jul 17 '16 18:07 phunterlau

This is the problem: mxnet #2585, pls check https://github.com/dmlc/mxnet/pull/2585

lbin avatar Jul 18 '16 03:07 lbin

@lbin you are right, one needs to add pad=(1, 1) like

pool = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pad=(1, 1), pool_type='max', attr=mirror_attr)

while VGG has no problems

phunterlau avatar Jul 19 '16 18:07 phunterlau