keras-shufflenetV2
keras-shufflenetV2 copied to clipboard
shuffle_unit函数是不是有点问题(Is there a bug with the function of shuffle_unit?)
stage2, output_channels如果是116, 这个代码那两条支路输出都是116,concat后stage2就是232了吧? (If at stage2, the output_channels is 116, this function will output 232 channels after concat.)
let me check
the concat is good, because there is channel split later.
But I find the Total params is larger than paper, and the 1st conv shape seems wrong. wondering if this model is trained, could you reproduce with same accuracy as the paper?
Total params: 5,043,740
Trainable params: 5,015,620
Non-trainable params: 28,120
input_1 (InputLayer) (None, 224, 224, 3) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 112, 112, 24) 648 input_1[0][0]
__________________________________________________________________________________________________
maxpool1 (MaxPooling2D) (None, 56, 56, 24) 0 conv1[0][0]
__________________________________________________________________________________________________
stage2/block1/1x1conv_1 (Conv2D (None, 56, 56, 116) 2900 maxpool1[0][0]
__________________________________________________________________________________________________
stage2/block1/bn_1x1conv_1 (Bat (None, 56, 56, 116) 464 stage2/block1/1x1conv_1[0][0]
hi cloudseasail: Due to lack of time and computing resource,I do not reproduce the result.
the concat is good, because there is channel split later.
But I find the Total params is larger than paper, and the 1st conv shape seems wrong. wondering if this model is trained, could you reproduce with same accuracy as the paper?
Total params: 5,043,740 Trainable params: 5,015,620 Non-trainable params: 28,120
input_1 (InputLayer) (None, 224, 224, 3) 0 __________________________________________________________________________________________________ conv1 (Conv2D) (None, 112, 112, 24) 648 input_1[0][0] __________________________________________________________________________________________________ maxpool1 (MaxPooling2D) (None, 56, 56, 24) 0 conv1[0][0] __________________________________________________________________________________________________ stage2/block1/1x1conv_1 (Conv2D (None, 56, 56, 116) 2900 maxpool1[0][0] __________________________________________________________________________________________________ stage2/block1/bn_1x1conv_1 (Bat (None, 56, 56, 116) 464 stage2/block1/1x1conv_1[0][0]
where is the 1st conv shape wrong?
Comparing this implementation to PyTorch official implementation makes this clear.
This is PyTorch official implementation. Note filter number of 58 on first convolution block just after MaxPool.
This is current implementation. Note filter number of 116 on first convolution block just after MaxPool2D.
Resolving this is fairly simple. In utils.py:shuffle_unit
,
def shuffle_unit(inputs, out_channels, bottleneck_ratio, strides=2, stage=1, block=1, relu6=False, weight_decay=0.0001):
if K.image_data_format() == 'channels_last':
bn_axis = -1
else:
raise ValueError('Only channels last supported')
prefix = 'stage{}/block{}'.format(stage, block)
bottleneck_channels = int(out_channels * bottleneck_ratio)
...
change the way how bottleneck_channels
is computed.
def shuffle_unit(inputs, out_channels, bottleneck_ratio, strides=2, stage=1, block=1, relu6=False, weight_decay=0.0001):
if K.image_data_format() == 'channels_last':
bn_axis = -1
else:
raise ValueError('Only channels last supported')
prefix = 'stage{}/block{}'.format(stage, block)
bottleneck_channels = int(out_channels) / 2
...