DenseNetCaffe icon indicating copy to clipboard operation
DenseNetCaffe copied to clipboard

About number of feature map in first block and "conv" layer in BC model

Open mjohn123 opened this issue 8 years ago • 7 comments

Hi, I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?

About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?

Thanks in advance

mjohn123 avatar Apr 21 '17 02:04 mjohn123

Thank you for your interests

  1. Yes, choosing 3 or 4 times is ok, and possibly better than 2 times. In our trials, 2 times works better than 1. Probably 3 or 4 are even better than 2, but making it too wide may not be the best choice.

  2. We follow the ResNet paper to design those numbers. The exact number is different, but the trend is similar. In the ResNet paper, they have different number of layers for each stage on the ImageNet model, but the same number of layers for CIFAR model. If you choose them equally, the first stage may consume a very large amount of memory, since the feature maps are of large size (56x56).

liuzhuang13 avatar Apr 21 '17 07:04 liuzhuang13

Thanks for quick response,

  1. In the resnet, the bottleneck is Conv1x1-Conv3x3-Conv1x1. However, your bottleneck is Conv1x1-Conv3x3. How about the last Conv 1x1? About the same number of "conv" layer in the dense block, I mean they are same as 6 (6-6-6-6, instead of 6-12-24-16). The reason I ask it because when I increase the number of "conv" layer for next dense block, it does not increase the performance (sometimes decreasing, I do not use dropout).

Do you have any schedule to release the prototxt for DenseNet-BC? I found someone does it but it is better to saw from the official version.

mjohn123 avatar Apr 21 '17 08:04 mjohn123

  1. In ResNet, the output of a residual block (1x1 -3x3 -1x1) needs to be the same as the input to do the summation, so you need to first reduce the dimension by 1x1 conv then increase the dimension back by another 1x1 conv. The purpose of our first 1x1 conv is to reduce the dimension, but no increasing back is needed because there is no summation at the end.

  2. If you use 6-6-6-6 the network will be too shallow, but if you use something like 12-12-12-12, the first stage will consume too much space. So using less number of layers in early stages makes more sense. In our experiments, adding layers rarely decrease the performance. Could you share with us your number of layers in each stage before and after the change?

  3. To be honest right now we don't have a prototxt for DenseNet-BC because we run our experiments mainly using Torch. But the prototxt at https://github.com/shicai/DenseNet-Caffe is very nice. Note that it is only for ImageNet training. For CIFAR you need slight modifications because of different image sizes. Based on our Non-BC structure for CIFAR, it is very straightforward to get the BC-structure given you understand the what "Bottleneck" and "Compression" mean.

liuzhuang13 avatar Apr 21 '17 09:04 liuzhuang13

Thanks for your detail. This is my number layer information for dense block

  • Case 1. Using 4-4-4-4
  • Case 2. Using 4-8-12-8

Growth rate for 2 cases same as 8, and first output feature map is 32, dropout=0

Let tried it with C10 for less training time.

mjohn123 avatar Apr 21 '17 09:04 mjohn123

Thanks. So you found case 2 achieves worse accuracy than case 1?

liuzhuang13 avatar Apr 21 '17 09:04 liuzhuang13

Yes. I did not test in the torch. I just test in Caffe. Could you verify it in torch?

mjohn123 avatar Apr 21 '17 09:04 mjohn123

Ok, I'll try it in Torch, when there is free GPU.

liuzhuang13 avatar Apr 21 '17 09:04 liuzhuang13