Zhuang Liu comments

Results 71 comments of


                                            Zhuang Liu

local subgradient = S*torch.sign(weight)

The (sub)gradient of absolute value function (L1 sparsity loss) is the sign function. Here we compute the subgradient directly without defining loss.

local subgradient = S*torch.sign(weight)

Because absolute value function is not differentiable at point x=0, so it is subgradient instead of gradient. But in practice, the weight x never becomes 0 so it is actually...

cifar10 flops higher than cifar100 on DenseNet(40% pruned)

Because these are two different models, and the algorithm prunes different part of the networks. Even if you prune a fixed amount of channels (40% in this case), FLOPs will...

Slimming DenseNet

Thanks for your interest. We prune channels according to the BN's scaling factors, and after this process we set small factors (and biases) to 0, then we see which channels...

1. In this basic DenseNet you can only prune outgoing weights. For example, if you set 10 of the 36 weights and biases in these module.features.denseblock_1.dense_basicblock_2.conv_33.norm.weight : torch.Size([36]) module.features.denseblock_1.dense_basicblock_2.conv_33.norm.bias :...

Slimming DenseNet

1. I wrote a channel selection layer and place it before the batch normalization layer. This layer selects the channels using the index of selected channels as the parameter. But...

Slimming DenseNet

In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet.

Why not use BatchNorm in-place, any concern?

Hi @WenzhMicrosoft , sorry, what do you mean by in-place BatchNorm? We know Torch supports in-place ReLU, but we're not aware of in-place BatchNorm layer.

Why not use BatchNorm in-place, any concern?

Sorry, I thought the issue was opened on our Torch repo. Please ignore the comment above. The reason is that in-place BatchNorm layers will overwrite the incoming feature maps, which...

The amount of parameters

Hi! "BC" stands for bottleneck(B) and compression(C). This is explained at the "compression" paragraph at section 3 of the paper. To use a original DenseNet, you need to also set...