Zhuang Liu
Zhuang Liu
The (sub)gradient of absolute value function (L1 sparsity loss) is the sign function. Here we compute the subgradient directly without defining loss.
Because absolute value function is not differentiable at point x=0, so it is subgradient instead of gradient. But in practice, the weight x never becomes 0 so it is actually...
Because these are two different models, and the algorithm prunes different part of the networks. Even if you prune a fixed amount of channels (40% in this case), FLOPs will...
Thanks for your interest. We prune channels according to the BN's scaling factors, and after this process we set small factors (and biases) to 0, then we see which channels...
1. In this basic DenseNet you can only prune outgoing weights. For example, if you set 10 of the 36 weights and biases in these module.features.denseblock_1.dense_basicblock_2.conv_33.norm.weight : torch.Size([36]) module.features.denseblock_1.dense_basicblock_2.conv_33.norm.bias :...
1. I wrote a channel selection layer and place it before the batch normalization layer. This layer selects the channels using the index of selected channels as the parameter. But...
In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.
Hi @WenzhMicrosoft , sorry, what do you mean by in-place BatchNorm? We know Torch supports in-place ReLU, but we're not aware of in-place BatchNorm layer.
Sorry, I thought the issue was opened on our Torch repo. Please ignore the comment above. The reason is that in-place BatchNorm layers will overwrite the incoming feature maps, which...
Hi! "BC" stands for bottleneck(B) and compression(C). This is explained at the "compression" paragraph at section 3 of the paper. To use a original DenseNet, you need to also set...