DenseNet icon indicating copy to clipboard operation
DenseNet copied to clipboard

The amount of parameters

Open shamangary opened this issue 8 years ago • 6 comments

I use the following setting, as suggested in the github. L=40,k=12, no bottleneck However, the parameter number is not 1M, it's 0.6M. This problem also happen when I turn bottelneck on. I got different parameter number than the reported one. Please tell me where do I miss. Thank you.

Calling the model:

dn_opt = {}
dn_opt.depth = 40
dn_opt.dataset = 'cifar10'
model = paths.dofile('densenet.lua')(dn_opt)
model:cuda()
print(model:getParameters():size())

In densenet.lua

local growthRate = 12

    --dropout rate, set it to 0 to disable dropout, non-zero number to enable dropout and set drop rate
    local dropRate = 0

    --#channels before entering the first denseblock
    local nChannels = 2 * growthRate

    --compression rate at transition layers
    local reduction = 0.5

    --whether to use bottleneck structures
    local bottleneck = false

Output of the parameter size

599050
[torch.LongStorage of size 1]

shamangary avatar Apr 26 '17 08:04 shamangary

Hi! "BC" stands for bottleneck(B) and compression(C). This is explained at the "compression" paragraph at section 3 of the paper. To use a original DenseNet, you need to also set the variable "reduction" to 1 in the code.

liuzhuang13 avatar Apr 26 '17 09:04 liuzhuang13

Thank you very much. It matched now.

shamangary avatar Apr 26 '17 09:04 shamangary

On the otherhand, the amount of parameters of DenseNet is small indeed, but the GPU memory will still be consumed by the complex structure instead of the parameters.

By using the 8GB GPU, I was able to run 11M parameters WRN. However, I cannot run 0.8M parameters DenseNet-BC(L=100,k=12) since out-of-memory problem. This might be caused by a lot of feature maps are stored during training.

shamangary avatar Apr 26 '17 09:04 shamangary

Thanks for pointing out. I've just found other people discussing this, and wrote a comment on reddit here https://www.reddit.com/r/MachineLearning/comments/67fds7/d_how_does_densenet_compare_to_resnet_and/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=MachineLearning

My suggestion is that trying a shallow and wide densenet, by setting depth smaller and growthRate larger.

liuzhuang13 avatar Apr 26 '17 12:04 liuzhuang13

Hello @shamangary , regarding the memory cost of feature maps, currently we have a Caffe implementation which trys to address the memory hungry problem (listed under much more spatial efficient caffe implementation), the DenseNet-BC (L=100,k=12) should take no more than 2.5 GB when running with test on, about 1.7 GB when running without test mode. (Caffe seems to allocate separate spaces for testing.) Hope that would help!

Tongcheng avatar Apr 26 '17 22:04 Tongcheng

OK. Thanks! Despite I wish Torch can also have such property. (QAQ)

shamangary avatar Apr 27 '17 02:04 shamangary