micronet icon indicating copy to clipboard operation
micronet copied to clipboard

The consistency between the code and the description of the paper.

Open sang-yc opened this issue 4 years ago • 8 comments

Hello, I found that the M0 of the code and the M0 of the paper are not the same structure. I would like to ask whether the code of Micro-Block-A, Micro-Block-B, and Micro-Block-C is consistent with the description of the paper and whether there is any difference? Thank you.

sang-yc avatar Sep 29 '21 13:09 sang-yc

All the models are consistent with the description in the paper. Using M0 as an example, if you take a look at Table 1 in the paper, you can find the hidden dimension C/R is {8,12,16,32,64,96}, which is exactly the same as the config file used for M0.

liyunsheng13 avatar Sep 29 '21 17:09 liyunsheng13

First of all, thank you for your reply! I still have questions about Dynamic Shift Max. I have carefully studied your paper, Dynamic ReLU. As your paper says, when J = 1, Dynamic Shift Max and Dynamic ReLU are the same. Why is j taken as 2? Shouldn't j be the same as the number of groups? When the number of groups is different, shouldn't j change dynamically? Thank you!

sang-yc avatar Sep 30 '21 13:09 sang-yc

Like you mentioned, when J=1, Dynamic Shift-Max is just Dynamic ReLU with the expression like y=max(a1x1+b1x1) (x1 is the first channel of the feature map, a1,b1 are the dynamic coefficients). When J=2, the output y will become max(a1x1+a2x2, b1x2+b2x2), where channel x1 and x2 are fused. This is the key difference compared to Dynamic ReLU. In our implementation, actually, we found x2 should be achieved with group shift instead of channel shift, thus it is x_{jC/G}. So the value of J has nothing to do with the group number, it just depends on how many channels you want to fuse and of course J<=G.

liyunsheng13 avatar Sep 30 '21 21:09 liyunsheng13

Thank you for your reply! Dynamic ReLU : y=max(a1x1+b1x1) . According to your code and paper, I think the expression for Dynamic ReLU should be y=max(a1x1+b1)(x1 is the first channel of the feature map, a1,b1 are the dynamic coefficients).I don't know if it's my wrong understanding or your wrong writing. There are still some in the code that is difficult to understand. What does the parameter in class Dynamic Shift-Max mean? As follows: activation.py, line 111 def init(self, inp, oup, reduction=4, act_max=1.0, act_relu=True, init_a=[0.0, 0.0], init_b=[0.0, 0.0], relu_before_pool=False, g=None, expansion=False) The parameters are much more complex than Dynamic ReLU. I hope you can tell me what these parameters represent in Dynamic Shift-Max. I understand inp and oup. Also, like Dynamic ReLU, the number of parameters is 2KC.The number of parameters of Dynamic Shift-Max should be 2KCJ. Why is the number of parameters in your paper is KCJ? Thank you!

sang-yc avatar Oct 02 '21 06:10 sang-yc

Oh sorry, my writing is incorrect. The expression of Dynamic ReLU is y=max(a1x1, b1x1). It just picks up the feature point with stronger activation.

For the meaning of the input parameters, unfortunately, they are about the implementation details such as initialization (init_a=[0.0, 0.0], init_b=[0.0, 0.0]) and it is hard for me to explain them. Besides, it has nothing to do with the understanding of Dynamic Shift-Max. I suggest you just to run the code step by step and you can get how the parameters influence the implementation easily.

For the number of parameters contained in Dynamic shift-max, since it considers channel shifting, it has to be implemented with moer parameters. For J=2, Dynamic Shift-max is max(a1x1+a2x2, b1x2+b2x2) with parametes a1, a2, b1 and b2 which doubles the parameters contained with dynamic relu (y=max(a1x1, b1x1))

liyunsheng13 avatar Oct 02 '21 07:10 liyunsheng13

For M0, output channel of stem layer is 4 in the code, while it's 6 in the paper. I'm confused.

FlyMoonSky avatar Oct 11 '21 15:10 FlyMoonSky

There is no inconsistency for M0. You might read the old version of our paper.

liyunsheng13 avatar Oct 11 '21 17:10 liyunsheng13

There is no inconsistency for M0. You might read the old version of our paper.

Thank you for your kind reply! It's really the problem of paper version. I have refered to the latest version.

FlyMoonSky avatar Oct 12 '21 02:10 FlyMoonSky