rethinking_bottleneck_design icon indicating copy to clipboard operation
rethinking_bottleneck_design copied to clipboard

Param and MAdd

Open Rookielike opened this issue 4 years ago • 28 comments

hi friend, did you write your model strictly according to the paper? i have written a version of mobilenext,but param and MAdd are lower than the value in paper. i am a little confuse.. did you use some other structure in mobilenext?

Rookielike avatar Jul 16 '20 01:07 Rookielike

The same.

d-li14 avatar Jul 16 '20 14:07 d-li14

hi friend, did you write your model strictly according to the paper? i have written a version of mobilenext,but param and MAdd are lower than the value in paper. i am a little confuse.. did you use some other structure in mobilenext?

Hi Thanks for your interest in the work and pointing out this typo! there is a typo in the table where the repeat number for 960x960 block should be 3. I have corrected this in the camera ready version but forgot to correct the one in the arxiv version.

zhoudaquan avatar Jul 18 '20 14:07 zhoudaquan

Thanks but I'm not sure what 960x960 means.

d-li14 avatar Jul 18 '20 15:07 d-li14

@d-li14 i think it means the layers between 7 7 576 and 7 7 960

Rookielike avatar Jul 20 '20 03:07 Rookielike

@d-li14 did you get correct values of param and flops?

Rookielike avatar Jul 21 '20 03:07 Rookielike

@zhoudaquan "the repeat number for 960 * 960 block should be 3" does it mean the repeat number of block between 7 * 7 * 960 and 7 * 7 * 960 is 3? (the layer from 7 * 7 * 576 to 7 × 7 × 960 contains 4 block? )

Rookielike avatar Jul 21 '20 06:07 Rookielike

@zhoudaquan "the repeat number for 960 * 960 block should be 3" does it mean the repeat number of block between 7 * 7 * 960 and 7 * 7 * 960 is 3? (the layer from 7 * 7 * 576 to 7 × 7 × 960 contains 4 block? )

Hi, there are totally 3 block from 7 * 7 * 576 to 7 * 7 * 960. 1 for transition from 7 * 7 * 576 to 7 * 7 * 960 and 2 more for 7 * 7 * 960 to 7 * 7 * 960.

There are totally 21 blocks in the network including the conv head... if there are any other questions, do leave a message...

The code has been approved by the company and I though we can release within one month hopefully.....

zhoudaquan avatar Jul 21 '20 06:07 zhoudaquan

Thanks but I'm not sure what 960x960 means.

@d-li14 The meaning of 960 * 960 are referring to the channel dimension of the conv ops.... exactly as pointed by @Rookielike .. sry for this confusion

zhoudaquan avatar Jul 21 '20 06:07 zhoudaquan

thx for reply. and i have some other questions :did you fix the first conv2d channels to 32 ? did you fix the last conv2d channels to 1280 when width_mul <1?

Rookielike avatar Jul 21 '20 07:07 Rookielike

thx for reply. and i have some other questions :did you fix the first conv2d channels to 32 ? did you fix the last conv2d channels to 1280 when width_mul <1?

@Rookielike We fix the last 1280 dimension but make the first conv2d proportional to the width multiplier.

zhoudaquan avatar Jul 21 '20 07:07 zhoudaquan

I have tried to add one block here, as pointed out by @Rookielike, but still can not reach the params and FLOPs.

d-li14 avatar Jul 21 '20 14:07 d-li14

Me too,my FLOPs is 274 MB. I wonder if there are any more changes.

BshoterJ avatar Jul 22 '20 02:07 BshoterJ

Me too,my FLOPs is 274 MB. I wonder if there are any more changes.

@d-li14 @Rookielike Maybe you guys can put your printed model here? I can help check the difference....

zhoudaquan avatar Jul 22 '20 02:07 zhoudaquan

image the values in picture above are including Conv2d Linear and BN image the values in pixture above without BN

Rookielike avatar Jul 22 '20 02:07 Rookielike

image the values in picture above are including Conv2d Linear and BN image the values in pixture above without BN

@Rookielike Could you help print the model with layer configurations? The one with input/output channels.

zhoudaquan avatar Jul 22 '20 02:07 zhoudaquan

image image image image image the last SandglassBlock is not in picture and it is : output_channel = int(1280 * width_mult) if width_mult > 1.0 else 1280 layers.append(block(input_channel, output_channel, 1, 6, identity_mul))

Rookielike avatar Jul 22 '20 02:07 Rookielike

@zhoudaquan I have put my model definition in https://github.com/d-li14/mobilenext.pytorch for your checking, The FLOPs is 275M now, as also shown by others.

d-li14 avatar Jul 22 '20 03:07 d-li14

@zhoudaquan I have put my model definition in https://github.com/d-li14/mobilenext.pytorch for your checking, The FLOPs is 275M now, as also shown by others.

@d-li14 @Rookielike Hi from what I see, the hidden layer in the SGBlock is calculated based on the output channel of the block. This will make a difference on transition block where output channel is larger than the input channel.

zhoudaquan avatar Jul 22 '20 07:07 zhoudaquan

@zhoudaquan still confusing... https://github.com/d-li14/mobilenext.pytorch/blob/master/mobilenext.py#L48, we should change inp to oup?

d-li14 avatar Jul 22 '20 07:07 d-li14

@zhoudaquan still confusing... https://github.com/d-li14/mobilenext.pytorch/blob/master/mobilenext.py#L48, we should change inp to oup?

Yeah and make sure the channel number is divisible by 16

zhoudaquan avatar Jul 22 '20 07:07 zhoudaquan

@zhoudaquan but changing the above line to

hidden_dim = _make_divisible(round(oup // reduction_ratio), 16)

gives 370M flops.

d-li14 avatar Jul 22 '20 07:07 d-li14

@zhoudaquan but changing the above line to

hidden_dim = _make_divisible(round(oup // reduction_ratio), 16)

gives 370M flops.

image

I write it this way...

zhoudaquan avatar Jul 22 '20 07:07 zhoudaquan

so any difference between these two writing?

d-li14 avatar Jul 22 '20 08:07 d-li14

@zhoudaquan thx for reply. According to your guide,i got this (the last SandglassBlock' channels number is multiplied by width_mult except for width_mult < 1): image is it right?

Rookielike avatar Jul 22 '20 09:07 Rookielike

hi friend, did you use some aug when training?

Rookielike avatar Aug 03 '20 09:08 Rookielike

@zhoudaquan thx for reply. According to your guide,i got this (the last SandglassBlock' channels number is multiplied by width_mult except for width_mult < 1): image is it right?

@Rookielike Hi sorry for this late reply. I wa preparing for the conference and some other projects. Yeah, when counting the madd with BN, the number are almost same. As a common practice, we remove the bn layer when counting the madd and that will give same number as I punt in the paper

zhoudaquan avatar Aug 04 '20 04:08 zhoudaquan

hi friend, did you use some aug when training?

@Rookielike besides mean/var norm, we only use color jitter and the bilinear interpolation

zhoudaquan avatar Aug 04 '20 04:08 zhoudaquan

As a data point, I tried to replicate the model in TF2 and got similar # of parameters like reported above. This is for MNext 1.0 so make divisible shouldn't matter here I guess?

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
mobilenext_backbone (Functio (None, 7, 7, 1280)        1967760   
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 1280)              0         
_________________________________________________________________
predictions (Dense)          (None, 1000)              1281000   
=================================================================
Total params: 3,248,760
Trainable params: 3,208,040
Non-trainable params: 40,720
_________________________________________________________________ 

ethanyanjiali avatar Dec 06 '20 17:12 ethanyanjiali