keras-shufflenetV2 icon indicating copy to clipboard operation
keras-shufflenetV2 copied to clipboard

shuffle_unit函数是不是有点问题(Is there a bug with the function of shuffle_unit?)

Open Hanhanhan11 opened this issue 6 years ago • 5 comments

stage2, output_channels如果是116, 这个代码那两条支路输出都是116,concat后stage2就是232了吧? (If at stage2, the output_channels is 116, this function will output 232 channels after concat.)

Hanhanhan11 avatar Nov 24 '18 12:11 Hanhanhan11

let me check

opconty avatar Nov 26 '18 10:11 opconty

the concat is good, because there is channel split later.

But I find the Total params is larger than paper, and the 1st conv shape seems wrong. wondering if this model is trained, could you reproduce with same accuracy as the paper?

Total params: 5,043,740
Trainable params: 5,015,620
Non-trainable params: 28,120
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 24) 648         input_1[0][0]                    
__________________________________________________________________________________________________
maxpool1 (MaxPooling2D)         (None, 56, 56, 24)   0           conv1[0][0]                      
__________________________________________________________________________________________________
stage2/block1/1x1conv_1 (Conv2D (None, 56, 56, 116)  2900        maxpool1[0][0]                   
__________________________________________________________________________________________________
stage2/block1/bn_1x1conv_1 (Bat (None, 56, 56, 116)  464         stage2/block1/1x1conv_1[0][0]    

cloudseasail avatar Feb 08 '19 00:02 cloudseasail

hi cloudseasail: Due to lack of time and computing resource,I do not reproduce the result.

opconty avatar Feb 14 '19 09:02 opconty

the concat is good, because there is channel split later.

But I find the Total params is larger than paper, and the 1st conv shape seems wrong. wondering if this model is trained, could you reproduce with same accuracy as the paper?

Total params: 5,043,740
Trainable params: 5,015,620
Non-trainable params: 28,120
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 24) 648         input_1[0][0]                    
__________________________________________________________________________________________________
maxpool1 (MaxPooling2D)         (None, 56, 56, 24)   0           conv1[0][0]                      
__________________________________________________________________________________________________
stage2/block1/1x1conv_1 (Conv2D (None, 56, 56, 116)  2900        maxpool1[0][0]                   
__________________________________________________________________________________________________
stage2/block1/bn_1x1conv_1 (Bat (None, 56, 56, 116)  464         stage2/block1/1x1conv_1[0][0]    

where is the 1st conv shape wrong?

ArtemisZGL avatar Aug 20 '19 09:08 ArtemisZGL

Comparing this implementation to PyTorch official implementation makes this clear. 스크린샷 2020-12-30 오후 11 32 39 This is PyTorch official implementation. Note filter number of 58 on first convolution block just after MaxPool. 스크린샷 2020-12-30 오후 11 32 33 This is current implementation. Note filter number of 116 on first convolution block just after MaxPool2D.

Resolving this is fairly simple. In utils.py:shuffle_unit,

def shuffle_unit(inputs, out_channels, bottleneck_ratio, strides=2, stage=1, block=1, relu6=False, weight_decay=0.0001):
    if K.image_data_format() == 'channels_last':
        bn_axis = -1
    else:
        raise ValueError('Only channels last supported')

    prefix = 'stage{}/block{}'.format(stage, block)
    bottleneck_channels = int(out_channels * bottleneck_ratio)
...

change the way how bottleneck_channels is computed.

def shuffle_unit(inputs, out_channels, bottleneck_ratio, strides=2, stage=1, block=1, relu6=False, weight_decay=0.0001):
    if K.image_data_format() == 'channels_last':
        bn_axis = -1
    else:
        raise ValueError('Only channels last supported')

    prefix = 'stage{}/block{}'.format(stage, block)
    bottleneck_channels = int(out_channels) / 2
...

lightb0x avatar Dec 30 '20 14:12 lightb0x