rethinking_bottleneck_design
rethinking_bottleneck_design copied to clipboard
Param and MAdd
hi friend, did you write your model strictly according to the paper? i have written a version of mobilenext,but param and MAdd are lower than the value in paper. i am a little confuse.. did you use some other structure in mobilenext?
The same.
hi friend, did you write your model strictly according to the paper? i have written a version of mobilenext,but param and MAdd are lower than the value in paper. i am a little confuse.. did you use some other structure in mobilenext?
Hi Thanks for your interest in the work and pointing out this typo! there is a typo in the table where the repeat number for 960x960 block should be 3. I have corrected this in the camera ready version but forgot to correct the one in the arxiv version.
Thanks but I'm not sure what 960x960 means.
@d-li14 i think it means the layers between 7 7 576 and 7 7 960
@d-li14 did you get correct values of param and flops?
@zhoudaquan "the repeat number for 960 * 960 block should be 3" does it mean the repeat number of block between 7 * 7 * 960 and 7 * 7 * 960 is 3? (the layer from 7 * 7 * 576 to 7 × 7 × 960 contains 4 block? )
@zhoudaquan "the repeat number for 960 * 960 block should be 3" does it mean the repeat number of block between 7 * 7 * 960 and 7 * 7 * 960 is 3? (the layer from 7 * 7 * 576 to 7 × 7 × 960 contains 4 block? )
Hi, there are totally 3 block from 7 * 7 * 576 to 7 * 7 * 960. 1 for transition from 7 * 7 * 576 to 7 * 7 * 960 and 2 more for 7 * 7 * 960 to 7 * 7 * 960.
There are totally 21 blocks in the network including the conv head... if there are any other questions, do leave a message...
The code has been approved by the company and I though we can release within one month hopefully.....
Thanks but I'm not sure what 960x960 means.
@d-li14 The meaning of 960 * 960 are referring to the channel dimension of the conv ops.... exactly as pointed by @Rookielike .. sry for this confusion
thx for reply. and i have some other questions :did you fix the first conv2d channels to 32 ? did you fix the last conv2d channels to 1280 when width_mul <1?
thx for reply. and i have some other questions :did you fix the first conv2d channels to 32 ? did you fix the last conv2d channels to 1280 when width_mul <1?
@Rookielike We fix the last 1280 dimension but make the first conv2d proportional to the width multiplier.
I have tried to add one block here, as pointed out by @Rookielike, but still can not reach the params and FLOPs.
Me too,my FLOPs is 274 MB. I wonder if there are any more changes.
Me too,my FLOPs is 274 MB. I wonder if there are any more changes.
@d-li14 @Rookielike Maybe you guys can put your printed model here? I can help check the difference....
the values in picture above are including Conv2d Linear and BN
the values in pixture above without BN
the values in picture above are including Conv2d Linear and BN
the values in pixture above without BN
@Rookielike Could you help print the model with layer configurations? The one with input/output channels.
the last SandglassBlock is not in picture and it is :
output_channel = int(1280 * width_mult) if width_mult > 1.0 else 1280
layers.append(block(input_channel, output_channel, 1, 6, identity_mul))
@zhoudaquan I have put my model definition in https://github.com/d-li14/mobilenext.pytorch for your checking, The FLOPs is 275M now, as also shown by others.
@zhoudaquan I have put my model definition in https://github.com/d-li14/mobilenext.pytorch for your checking, The FLOPs is 275M now, as also shown by others.
@d-li14 @Rookielike Hi from what I see, the hidden layer in the SGBlock is calculated based on the output channel of the block. This will make a difference on transition block where output channel is larger than the input channel.
@zhoudaquan still confusing... https://github.com/d-li14/mobilenext.pytorch/blob/master/mobilenext.py#L48, we should change inp to oup?
@zhoudaquan still confusing... https://github.com/d-li14/mobilenext.pytorch/blob/master/mobilenext.py#L48, we should change inp to oup?
Yeah and make sure the channel number is divisible by 16
@zhoudaquan but changing the above line to
hidden_dim = _make_divisible(round(oup // reduction_ratio), 16)
gives 370M flops.
@zhoudaquan but changing the above line to
hidden_dim = _make_divisible(round(oup // reduction_ratio), 16)
gives 370M flops.
I write it this way...
so any difference between these two writing?
@zhoudaquan
thx for reply. According to your guide,i got this (the last SandglassBlock' channels number is multiplied by width_mult except for width_mult < 1):
is it right?
hi friend, did you use some aug when training?
@zhoudaquan thx for reply. According to your guide,i got this (the last SandglassBlock' channels number is multiplied by width_mult except for width_mult < 1):
is it right?
@Rookielike Hi sorry for this late reply. I wa preparing for the conference and some other projects. Yeah, when counting the madd with BN, the number are almost same. As a common practice, we remove the bn layer when counting the madd and that will give same number as I punt in the paper
hi friend, did you use some aug when training?
@Rookielike besides mean/var norm, we only use color jitter and the bilinear interpolation
As a data point, I tried to replicate the model in TF2 and got similar # of parameters like reported above. This is for MNext 1.0 so make divisible shouldn't matter here I guess?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
mobilenext_backbone (Functio (None, 7, 7, 1280) 1967760
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 1280) 0
_________________________________________________________________
predictions (Dense) (None, 1000) 1281000
=================================================================
Total params: 3,248,760
Trainable params: 3,208,040
Non-trainable params: 40,720
_________________________________________________________________