caffe icon indicating copy to clipboard operation
caffe copied to clipboard

Training on ImageNet using ResNet-18 and not convergence.

Open weitaoatvison opened this issue 6 years ago • 8 comments

Hi, wei, I have downloaded your code on my PC and use your code to test the training on ResNet18 using ImageNet Dataset to train from scratch. But I find it seems hard to convergence with the setting below:

net: "./models/resnet-18-lowrank/train.prototxt" test_iter: 2000 test_interval: 5000 test_initialization: true display: 30 base_lr: 0.005 lr_policy: "multistep" stepvalue: 150000 stepvalue: 300000 gamma: 0.1 max_iter: 600000 momentum: 0.9 weight_decay: 0.0001 snapshot: 6000 snapshot_prefix: "./models/resnet-18-lowrank/resnet-18" solver_mode: GPU force_type: "Constant" force_decay: 0.0001

And I also did a contrast experiment before that I use the same configure for resnet-18 training without force_type and force_decay, just set the base_lr to 0.05 as your paper say in section5.1, it seems it can convergence quickly. So, could you give me some advice?

weitaoatvison avatar Dec 07 '17 05:12 weitaoatvison

@weitaoatvison

  1. Did you train it from scratch or fine-tune it? It's better to fine-tune
  2. Did you try to use a smaller force_decay? force_decay should vary with your network architecture.

wenwei202 avatar Dec 11 '17 18:12 wenwei202

I tried a smaller force_decay and train it from scratch. It worked. But I still have some questions.

  1. I trained the resnet18 using force_decay on Imagenet from scratch. And then I use nn_decomposer.py to do low-rank, the rank-ratio I set is 0.95. The original top5 is 0.89, but after decompose it drop to 0.34 without finetune, in the meantime, I test the speed on titan-x with cuda8.0,cudnn5.1, the baseline time consuming is 6.18ms, and After low-rank, the time value became 6.24ms(if training with out force_decay, the time consuming is 7.5ms), and it seems hard to achieve 2X speedup on GPU your paper declared. Is it right?
  2. I want to know If I do low-rank layer by layer, will the final result be better than do low-rank on global net? for example, I decompose the first layer and then I fine-tuned it. After fine-tuning , I did decompose again to next layer. Your work is very good and your advice will help me a lot. Thanks!

weitaoatvison avatar Dec 12 '17 07:12 weitaoatvison

  1. Fine-tuning is required to recover accuracy after decomposing. Please do layer-wise timing to verify the bottleneck. The architecture of resnet is very different from alexnet.
  2. Not sure how much better it will be, but the fine-tuning time will be significantly increased.

wenwei202 avatar Dec 12 '17 14:12 wenwei202

Thanks for your answer! I found in your paper you showed some results about resnet-20 and googlenet in Figure 5. Are these results trained on ImageNet? What is the speedup of them on GPU? And could you share the caffemodels for quick test? Thanks!

weitaoatvison avatar Dec 13 '17 09:12 weitaoatvison

ResNet is trained by cifar10 while Googlenet by ImageNet. I recommend to first test how low rank approximation accelerates them without force regularization. If it's promising, then you may use force regularization for higher speed.

wenwei202 avatar Dec 13 '17 15:12 wenwei202

Hi, I have trained the resnet18 use a higher force_regulation item, and at last it can achive top5 ~0.87(original is ~0.89). I do low-rank on this model with 0.95 rank-ratio and it can reduce the caffemodel size from 48MB -> 3.2MB(if using standard training it can only reduce from 48MB -> 36MB), I have check the prototxt for its num_output in each layer, I find the num_output of some layers reduced to 1. But when I use the low-rank model to fine-tune to original accuracy, when test at the start, top5 and top1 is nearly 0, and after some epochs, it still can't achieve a good result, the result top5 is ~30. So, I want to know if the method has a limitation when the rank is reduced to a very small number although I keep the rank ratio at 0.95?

weitaoatvison avatar Dec 21 '17 05:12 weitaoatvison

@weitaoatvison this is one of the open issues in this work pending to solve as I mentioned here. Current strategy is to use a smaller rank ratio. Let me know if you have some progress on this issue. Thanks.

wenwei202 avatar Dec 24 '17 19:12 wenwei202

OK. I will do more work on it. Thanks!

weitaoatvison avatar Dec 25 '17 05:12 weitaoatvison