margindistillation icon indicating copy to clipboard operation
margindistillation copied to clipboard

Train problems

Open YjDai opened this issue 4 years ago • 5 comments

CUDA_VISIBLE_DEVICES='0' python -u train.py --network y1 --loss margin_distillation --dataset emore

I have a problem:

Traceback (most recent call last): File "train.py", line 717, in main() File "train.py", line 714, in main train_net(args) File "train.py", line 703, in train_net epoch_end_callback = epoch_cb ) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/base_module.py", line 498, in fit for_training=True, force_rebind=force_rebind) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 429, in bind state_names=self._state_names) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 280, in init self.bind_exec(data_shapes, label_shapes, shared_group) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 376, in bind_exec shared_group)) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 670, in _bind_ith_exec shared_buffer=shared_data_arrays, **input_shapes) File "/usr/local/lib/python2.7/dist-packages/mxnet/symbol/symbol.py", line 1782, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (256, 3, 112, 112) softmax_label: (256, 2) Error in operator slice_axis0: [17:36:53] src/operator/tensor/./matrix_op-inl.h:1295: Check failed: *end <= axis_size: Invalid end for end=129 as axis_size is 2

YjDai avatar Aug 24 '20 09:08 YjDai

How you obtained emore dataset? Did you perform all steps from "Data preparation" instruction in README.md file?

david-svitov avatar Sep 24 '20 06:09 david-svitov

How you obtained emore dataset? Did you perform all steps from "Data preparation" instruction in README.md file?

In terms of the last step, can you give an example command? Thanks!

doitslow avatar Sep 26 '20 08:09 doitslow

We fixed Data preparation instruction in README.md. Please try new one.

david-svitov avatar Sep 28 '20 12:09 david-svitov

CUDA_VISIBLE_DEVICES='0' python -u train.py --network y1 --loss margin_distillation --dataset emore

I have a problem:

Traceback (most recent call last): File "train.py", line 717, in main() File "train.py", line 714, in main train_net(args) File "train.py", line 703, in train_net epoch_end_callback = epoch_cb ) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/base_module.py", line 498, in fit for_training=True, force_rebind=force_rebind) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/module.py", line 429, in bind state_names=self._state_names) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 280, in init self.bind_exec(data_shapes, label_shapes, shared_group) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 376, in bind_exec shared_group)) File "/usr/local/lib/python2.7/dist-packages/mxnet/module/executor_group.py", line 670, in _bind_ith_exec shared_buffer=shared_data_arrays, **input_shapes) File "/usr/local/lib/python2.7/dist-packages/mxnet/symbol/symbol.py", line 1782, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (256, 3, 112, 112) softmax_label: (256, 2) Error in operator slice_axis0: [17:36:53] src/operator/tensor/./matrix_op-inl.h:1295: Check failed: *end <= axis_size: Invalid end for end=129 as axis_size is 2

I have the same problem. Have you solved?

XiXiRuPan avatar Jan 06 '21 14:01 XiXiRuPan

I repeated the dataset creation procedure from README.md. And faced no problem with training then on the obtained dataset. I consider the next options as possible sources of the problem:

  1. You use '--dataset emore'. Please check that you use the correct path in 'config.py' or use '--dataset emore_soft' as proposed in the instruction;
  2. You use Python 2.7. I tested code with Python 3.6 only.
  3. If you still have this problem I prepared a small training dataset for you: https://drive.google.com/file/d/1qDRkI_H0RI_MIHghjvOO63ETDK4UcpJt/view?usp=sharing If you have this problem with this small dataset and Python 3.6, please let me know.

david-svitov avatar Jan 06 '21 17:01 david-svitov