slimming icon indicating copy to clipboard operation
slimming copied to clipboard

Slimming Resnet

Open hiyijian opened this issue 8 years ago • 10 comments

Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. image The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image Almost the same situation in shortcut version. How do you handle this?

Thanks

hiyijian avatar Sep 21 '17 10:09 hiyijian

In our models, the residual branch is BN-RELU-CONV-BN-RELU-CONV-BN-RELU-CONV.

In the addition, all features from the identity mapping and the last CONV in residual branch are kept. So the main branch has the original widths of ResNets. The pruning only happens in layers inside residual branch.

Inside each residual branch:

  1. In the first BN layer, if we detect very small scaling parameters, we mask corresponding channels out, before the first BN layer, by a channel selection layer (Actually this channel selection causes a time overhead, thus I don't recommend to do it in practice).

  2. The last CONV output the same number of channels as the main branch (there's no BN to do selection).

  3. For other intermediate layers, the pruning is the same as in plain network (e.g., VGG).

If your residual branch is different from ours, you may need to modify the pruning process. But the key point is that the main branch doesn't get slimmed, the pruning is only inside residual branch. How you prune in the residual branch depends on how you order your BN and CONV layers.

liuzhuang13 avatar Sep 22 '17 00:09 liuzhuang13

Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how? Thanks

hiyijian avatar Sep 22 '17 11:09 hiyijian

What I mean by "main branch" is the identity shortcut throughout the network, so there are no BN layers in main branch. Whenever there is an BN, we can do channel pruning or selection according to its scaling parameters. Thanks!

liuzhuang13 avatar Sep 22 '17 22:09 liuzhuang13

hi, @liuzhuang13 , can you release the code about DenseNet-slimming? Thank you

youngfly11 avatar Oct 19 '17 12:10 youngfly11

Hi @youngfly11, thanks for your interests. DenseNet's code is a little different than VGG's. Unfortunately I am busy with other things now, so I will probably release the code when I have time next month.

The way I implemented DenseNet slimming can save parameters and FLOPs, however, cannot bring speedup in the current Torch package. I implemented it using a channel selection layer, which leads to slower inference than a normal network, because it involves memory copy, not in-place selection.

If you just want the same speed as normal network, after training you can set low scaling factors and corresponding biases to 0, and don't do gradient update on them. It's equivalent as actually pruning the channels.

Thanks

liuzhuang13 avatar Oct 19 '17 19:10 liuzhuang13

In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.

liuzhuang13 avatar Jul 06 '18 11:07 liuzhuang13

Thanks

hiyijian avatar Jul 13 '18 06:07 hiyijian

Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.

yyjabidintg avatar Oct 25 '18 09:10 yyjabidintg

Thanks for your wonderful work. But if the residual branch is CONV-RELU-BN-CONV-RELU-BN-CONV-RELU-BN. Then the channels of this residual branch is different from the main branch one. How should I handle this situation? Thank you.

hi,have you solved this problem?i also encounter this issue.

toyal avatar Oct 25 '19 02:10 toyal

Dear @liuzhuang13, I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right? So I can not figure out how to slim residual block using your method. image The two branches may have diffrient channels pruned, so we can only prune the intersection of both?

image Almost the same situation in shortcut version. How do you handle this?

Thanks

hi,how do you handle with this situation?thx

toyal avatar Oct 25 '19 03:10 toyal